Machine Learning for Automated Topic Discovery

Quickly discover topics, themes and linguistic patterns in any document collection.

  • Extract the major topics (themes) and their proportions in a set of research projects.
  • Find persons, places, organizations or locations in a document collection.
  • Extract and curate a vocabulary from a document collection, including relevant parts of speech.
  • Use your thesaurus or taxonomy for a particular domain of knowledge to find documents of interest.
  • Build a topic map for your website.
  • Find interesting patterns in a collection of user-defined categories or symbols (e.g., genetic sequences, instrument measurements, music notation).
About Topix

Knowledge Discovery

Topix is a modern web-based knowledge discovery application designed to provide the innovative data analyst, researcher or investigative journalist the tools to share insights quickly and effectively.

Topic Modeling

Topix uses a process called topic modeling, a special case of machine learning, to automatically discover and extract implicit ("hidden") topics from a document collection.

Natural Language Processing (NLP)

Topix includes state-of-the-art NLP algorithms that provide you with the power to further refine your discovery process by focusing on specific parts of speech or special entities of interest such as persons, places, organizations, locations, values and dates.

Web Enabled

By providing this fast online web service we hope to significantly increase experimentation, enable more rapid learning, and eventually build a shared repository of best practices, document collections, and models that can be shared and "mined" in order to accelerate progress in the practical use of topic modeling.

Bridge to Academic Research

We're continually evaluating the latest research for breakthroughs in this field. We hope to provide a useful bridge between the recent and emerging innovations in academic research and the practical challenges that you face every day.

Custom Implementations and Training

We provide custom cloud or on-premise implementations and training for organizations with large document sets, database integration, website harvesting, and internal knowledge sharing. We are committed to re-investing a portion revenues from these custom engagements back into the public version of Topix.

Knowledge Matters!

We strongly believe that the discovery and sharing of real knowledge is fundamental to any free and thriving society. We hope Topix will help democratize and spread the understanding and use of topic discovery and exploration.

Topix Explore: Features Overview

Below we provide an overview of the many topic modeling options, reports, and output available using "Explore."

NEW: Topix Network Explorer

The Topix Network Explorer is accessed from the topic listing table. Documents within the selected topic that pass a minimum similarity threshold are passed to the explorer. Click on one of the bars to change the minimum similarity to show links among documents. You can also search to find and hightlight documents, use the "Show Info" widget to show details each document as you hover over a node, and use the "Size" and "Color" options if you included data related to dollars, categories, or years.

Vocabulary Word Cloud

The top 200 vocabulary items in your document corpus are displayed in a word cloud, with the size proportional to frequency. When you choose to limit parsing to include only a selected part of speech (noun, verb, adjective) or entity (value, date, year, person, place, organization) only words that are selected will used in the calculation and display.

Corpus Statistics: Table

This panel provides a birds-eye view of the size and composition of your document collection (corpus).

Corpus Statistics: Charts

This is one of the several chart options that provide that complement the corpus statistics table.

Vocabulary Catalog

Use the Vocabulary Catalog on its own or to help to fine-tune your topic model. Find the prevalence of each vocabulary item in your document collection, and jump to the documents containing each word. You can choose to exclude additional words, and then re-run your model with these words excluded. You can continue this fine-tuning as long as you wish.

Topic Details

For each topic we display the top words correlated with each topic, and drilling from each topic to documents that are assoicated with each topic. Your can search and filter by word, topic, or percentage.

Advanced Analytics: Word Contributions to Each Topic

Topix includes several powerful visualizations that help you understand at a glance how well each topic is described by the top words generated for each topic. This one shows the contributions for the top 20 words generated for each topic.

Advanced Analytics: Vocabulary Counts by Topic Specificity

At a glance you can discover vocabulary words that have have a low count and high specificity. Research has shown these should be considered for removal for better topic conherence.

Standard Options

You can explore using our sample data sets or process your own document sets. Choose the number of topics to explore and experiment with the number of iterations to perform.

For English text corpora, we leverage state-of-the-art Natural Language Processing (NLP) options for parsing text. You can choose to only include, nouns, verbs, adjectives or persons, places, and organizations.

Advanced Options

You can explore using our sample data sets or process your own document sets. Choose the number of topics to explore and experiment with the number of iterations to perform.

There are powerfule advanced options for fine-tuning your topic modeling exploration.

Topic Model Summary

This panel captures key information related to each topic model you work on, including source files and both standard and advanced options.

Words by Topic

Filter or sort by any combination of topic, word or token count to develop a detailed understanding of the composition of your corpus.

Download Results

You have a wide variety of export options for further analysis, visualization, and exploration. For example, export your updated exclude list (stopwords) to use in future explorations. Other options include topic summaries, topic correlations, topic distribution by document, word distributions by topic, and word prevelance measures.

Recommended Resources
Get in touch

Phone: 410.849.9776


Address: Annapolis, MD, U.S.A.