Archive for the ‘data mining’ Category

ISC Consulting Powers Pytheas AI with BrainDocs

Friday, June 24th, 2016

We are pleased to publish ISC’s submission under the DIUx program.  The new  “Defense Innovation Unit Experimental (DIUx) serves as a bridge between those in the U.S. military executing on some of our nation’s toughest security challenges and companies operating at the cutting edge of technology.” Powered by ai-one’s Nathan ICE artificial intelligence core for language, ISC’s Pytheas AI will provide ISC with the technology to assist researchers and help our governments keep us safe.  Some proprietary sections have been deleted in the version below.

ISC White Paper for DIUx Technology Area of Interest: Knowledge Management

By Jeremy Toor, ISC Consulting Group

Executive Summary

ISC/ai-one develops a prototype using Pytheas artificial intelligence (Pytheas AI) which will provide automated intelligent information management, or knowledge management (KM) of multiple data sources.  Pytheas AI fingerprints the flow of data from almost any source including chat, email, message traffic, and other data.   This AI core then supports the user in a publish/subscribe architecture, with building knowledge from the fingerprinted data through queries and intuitive alerts that understand the difference and importance of contextual situations.

The abundant quantity of data that is available to users, analysts, and commanders today can make it challenging to build a concise and accurate picture from which dynamic assessments can be made. Both Command and Control (C2) and intelligence systems are largely data-centric. Users that are required to make strategic and tactical decisions will benefit from a task-centric user experience that is able to manage information as it is created and presented, and distil many sources of data into a manageable data flow.  This user experience, facilitated by Pytheas AI will deliver an KM Engine that can accelerate the decision making process.

Through Pytheas AI the user will be presented with data that has gone through automated processes to be categorized, tagged, and ranked according to its value in the current context of operations.  Pytheas AI will give the user flexibility to tailor their focus area and pull information from a wide breadth of sources as they build situational awareness and confidence to take action.

Phasing

ISC/ai-one proposes a three phase project.  Phase 1 will include one-week for initial installation, configuration and user training. Phase 2 will include a five-month period to support data ingestion, intelligent agent training and dashboard customization. Phase 3 will include a three-week evaluation and close out.

Technology

Pytheas AI is built with an artificial intelligence core to collect, organize and analyze language to uncover key links and patterns within large volumes of unstructured text.  The application empowers analysts to find the relationships necessary to discover, manage, process and exploit data.  Key features and attributes of Pytheas include:

  • Discovery of Concepts through the use of Intelligent Agents
  • Agent collections can be built from existing plans, roadmaps and strategy documents
  • DoD analysts can use common KM collections or build and share concept agents
  • Agents provide classification for query and tagging of documents
  • Application core is language independent
  • Fast and lightweight running on PC class machines or VMs

Pytheas AI is built upon ai-one’s BrainDocs software application (with NathanICE API core) which is a commercially ready and viable technology that has been applied to several use-cases similar to the requirements in the technology area of interest, knowledge management, that DIUx is seeking.  Our prototype for KM is ready for demonstration using sample data.

Pytheas uses the ability of ai-one’s proprietary NathanICE API to discern patterns in the words and associations that are central to the meaning of all or a portion of a text document (in the same way as the brain).  Nathan extracts these keywords and associations, filtering out the noise to create a proprietary fingerprint array of the concept that can be used in many ways.

Pytheas uses the fingerprint of a trained concept to find (rank) similar concepts within a corpus of information (documents, websites, databases) and returns paragraph-level results sorted by “similarity”. These results support a variety of workflows in enterprise compliance, classification, search and knowledge management.  Agent similarity scores are exported to Excel or your database to support analytics and BI tools. This can be done by the analyst for small ad hoc studies.  Agents can also be used to code years of legacy data without additional training.

Users employ agents in Pytheas AI to organize text based on contextual ideas and metadata dimensions, improving accuracy, consistency and saving substantial amounts of time in this tedious process.

The Basic Elements of Pytheas

Documents – Pytheas is capable of analyzing any form of unstructured text. In fact, our technology works best with semantically-rich content written in your business vernacular without external taxonomies or ontologies.  Working at the paragraph level it has been used on everything from text messages to database fields to long documents always with full traceability to source.

Conceptual Fingerprints – This is the “secret sauce” of our discovery capabilities. Pytheas uses the Nathan API keywords and associations to create semantic “fingerprints” of concepts. Because one concept can be written in multiple ways, our algorithm does not rely on word counts, natural language processing (NLP) or latent semantic analysis (LSA) when identifying and fingerprinting concepts.

Intelligent Agents – Pytheas agents examine and compare the conceptual fingerprints to find traces of concepts buried within your data. Our premise is that analyst is the expert and needs to be able to train their own army of software agents to “read” documents and deliver the relevant paragraph. Used as a collection, the scores from a collection of agents set the context for a user’s query.

Paragraph Level Concept Discovery – Pytheas provides the ability to categorize and display concept results at the paragraph-level. Users do not need to hunt through documents trying to find a concept that a search engine claims to be present. Our system will return the paragraph(s) that closely match a concept, sort and group the concepts by similarity to one another. Paragraphs can be evaluated and traced back to their source document for reporting and distribution.

Topic Mapper Entity and Sentiment

Figure 1. Topic Mapper Entity and Sentiment in SEC Filings

Ease of Integration – Pytheas application can be used with conventional desktop tools for ad hoc projects.  For workflow automation a Restful API provides developers an easy method to process documents and export results to SQL or other DBs for reporting and visualizations.

Optional Entity Extraction and Sentiment (Figure 1 above) – Complementing paragraph level concept detection is the ability to extract entities and/or score for sentiment so this information can be added to visualizations and follow on workflows.  Clients can use their own technology for this purpose or add custom analytics to further refine the insight for social network analysis, tagging existing file headers or streamlining the flow of information into the analyst.

Defense Utility

The immediate benefit to DoD is increased productivity, consistent analysis and more effective information management.  The long-term benefit is an ability to perform quicker, more informed decisions.

Operational users of this prototype include any person that has to search through data.  This includes anyone using SharePoint and other common organizational databases.  Analysts who must sift through massive amounts of data in order to discover relevant information will save countless hours through the employment of our prototype.   Through the employment of a similar use case at NASA, our customer was able to complete a typical six-week project in one-week!

Company and Relevant Use Case

Lead by ISC, personnel from ISC Consulting and ai-one inc. will execute the project.

ISC Consulting Group is a Service Disabled Veteran-Owned Small Business (SDVOSB). We are headquartered in Sierra Vista, Arizona, with operational offices at Ft. Huachuca, AZ; Orlando, FL; Ft. Gordon, GA; and Northern Virginia. ISC provides a full-spectrum of services, products & solutions supporting the DOD Intelligence Community and key commercial clients with advanced capabilities in Instructional Solutions, Cyber Security, Command and Control planning and operations, Intelligence operations, Information Technology, and Data Analytics through Artificial Intelligence products and services.

ai-one inc. is the developer of a proprietary core technology that emulates the complex pattern recognition functions of the human brain that can detect the key features and contextual meaning of text, time-series and visual data.  This technology will enable DIUx to score and analyze any piece of textual content and discover information by concept, bringing the dimension of AI understanding to knowledge management. This technology automatically generates a lightweight ontology that easily detects all relationships among data elements; solving the immediate problems facing the DIUx knowledge management based process and schedule.

Existing Customers

ISC has served several clients with Pytheas technology, including NASA Marshall Space Flight Center (MSFC).   Currently, Pytheas is being used by MSFC’s Advanced Concepts Office (ACO) under a Cooperative Agreement to assist in technology roadmap development and separately by the Office of Strategic Analysis and Communication (OSAC) to manage and report on their portfolio of project investments (similar to SBIR grants).   For example, the roadmap project is described below:

Overview of the NASA Advanced Concepts TAPP Pilot Project

The Advance Concepts Office (ACO) at MSFC, NASA is developing and refining methods and processes for performing Information Based Decisions for Strategic Technology Investments.  This system is currently referred to as TAPP, Technology Alignment & Prioritization Process.   This process supports the evaluation of the technologies for investment by NASA and MSFC to insure alignment with NASA mission plans, technology area priorities and strategic knowledge gaps.

TAPP creates an interactive system for exploring the almost mind boggling complexity of planning for multiple missions using over 400 technologies (many still in basic research) and hundreds of interrelated elements/sub-elements over 30-year planning horizons.

Pytheas provides NASA the capability to have data mining agents parse and score unstructured content against the nearly 400 technologies identified in the 15 Technology Roadmaps.  This ability to score proposals with agents allows ACO to perform statistical analysis within the Information Based Decision framework for Strategic Investments.

The immediate benefit to ACO is increased productivity and consistent analysis. The long-term benefit is an ability to perform quicker, more informed technology assessments, feasibility analysis, and concept studies that align with NASA evolving strategic goals and multiple mission objectives.

Conclusion

Given a six-month prototype build period, ISC/ai-one will demonstrate to DIUx that ISC/ai-one’s Pytheas AI application will enable the organization to save critical time and human capital in the implementation and operation of knowledge management systems.  Pytheas will empower the IC to rapidly and effectively sort through vast volumes of text data in order to gain knowledge and position decision makers with the right information to achieve stated organizational analytical research outcomes.

ai-one Powers Competitive Intelligence Analytics as a Service

Tuesday, June 7th, 2016

ai-one inc and KDD Analytics put their artificial intelligence and business intelligence expertise to work for competitive analysts.  After collaboration on projects from aerospace research to marketing surveys, the companies are pleased to announce a new service for C-Suite executives and analysts.

This blog was posted earlier on our Analyst Toolbox website.

Day in the Life of a Financial Competitive Analyst

“It’s 10:00 PM, the night before the quarterly board meeting, and we are still pulling together financial data on our company and competitors into presentation worthy graphics.

Procrastination?  No, the tools and processes just make it a recurring battle. One key section of our report is hampered by a lack of standardization in SEC filings.  Our auditors routinely deliver the internal financials at the last minute.  Reformatting the financials in a way consistent with comparatives, let alone making them visual, interactive and providing scenario analysis capability is a time consuming hassle.

We always run out of time to actually “analyze” the results…there has to be a better way”.

– Earl Harvey, Senior Financial Competitive Analyst

Sound familiar?  Earl’s problem led him to KDD Analytics and ai-one, and ultimately to a collaboration developing CIaaS (Competitive Intelligence as a Service).

The State of Financial Competitive Intelligence

Financial data on your competitors comes from SEC filings (10q/10k) via companies such as Dow Jones FactSet, Edgar Online, ThomsonOne and Bloomberg who provide aggregated financial data (typically) in Excel worksheets or through an API “firehose” that requires programming resources to navigate.

But then what?

The data still needs to be standardized across companies and reporting periods and presented in a visually digestible manner; often for people using different devices (desktop, tablet, mobile).  Moreover, this process needs to be repeatable every quarter with a consistent visual format and ideally delivered several days before the board meeting…a tall order for resource constrained competitive intelligence analysts.  As a result, “burning the midnight oil” sessions are the rule not the exception.

How Do You Avoid This Last Minute Stress?

Avoiding this fire drill (without hiring on more resources) is possible by using a service that has:

  • Standardized the data
  • Developed the visuals, charts and scenarios
  • Loaded and analyzed the latest data, and ideally,
  • Used A.I. to “read” and organize the relevant text from the docs.

That is, a source that will deliver a finished, interactive solution in a timely manner allowing you to focus on insight and analysis of the financial data, so you’re ready to be brilliant on demand (and getting more sleep).

Introducing the Financial Analyst Toolbox (FaTbx™), financial competitive intelligence as a service.

Currently in beta as a custom service, FaTbx™ is a set of more than 30 presentation ready Tableau dashboards, displaying interactive, comparative financial data for your company and other public companies critical to your business ecosystem.

Get the big picture fast: rankings and financial health, trends and topic heat maps (from our tech stocks demo).

FaTbx BIg Pic 3 slides

Then drill down:  Income statement waterfalls, balance sheet, cash flow details and topics (see how Apple, Google and Microsoft 2015 Q3 results compare below).

FaTbx Waterfall Pic 3 slides

Developed by experienced competition, artificial intelligence, analytics and visualization experts, FaTbx™ shows your company and competitors’ financials in a consistent, standardized and easily digestible manner.  Using the financials to spot issues and trends, the AI engine powers drill down to the disclosure text in the filings: no need to pull up a 10k and look for the narrative.

Filters adjust for financial category (e.g. income, cash flow, balance sheet, ratios), company, growth measure (e.g. quarter over quarter, year over year, CQGR), TTM and displayed time span.  Custom filters can be added based on your company’s need.

FaTbx™ is available as a cost effective annual subscription with quarterly or monthly updates.  The standard service includes comparatives for three publicly traded competitors, suppliers or customers.  It is delivered online via Tableau Server Edition or through a private web portal.  Subscription tiers depend on the level of support, customization, information sources and macroeconomic data desired.  Custom integration with internal KPIs can be provided.

FaTbx™ – Financial CI as a Service.   We streamline the grunt work of financial competitive analysis so you can focus on your company’s strategy and response. To learn more, contact me about the beta program or request a live demo.

Tom

Rumsfeld Conundrum- Finding the Unknown Unknown

Tuesday, January 27th, 2015

Since we began the process of building applications using our AI engine, we have been focused on working with ideas or concepts. With BrainDocs we built intelligent agents to find and score similarity for ideas in paragraphs, but still fell short of the vision we have for our solution. Missing was an intuitive and visual UI to explore content interactively using multiple concepts and  metadata (like dates, locations, etc). We want to give our users the power to create a rich and personal context to power through their research. What do I call this?

Some Google research led me to a great visualization and blog by David McCandless on the Taxonomy of Ideas. While the words in his viz are attributes of ideas, not the ideas themselves, it got me thinking in different ways about the problem.

Taxonomy of Ideas

If you substitute an idea (product or problem) in David’s matrix and add the dimension of time, you create a useful framework. If the idea above was “car”, then the top right might be Tesla and bottom left a Yugo (remember those?). Narrow the definition to “electric car” or generalize to “eco-friendly personal transportation” and the matrix changes. But insert an unsolved problem and now you have trouble applying the attributes. You also arrive at an innovator’s dilemma (not the seminal book by Clayton Christensen), the challenge of researching something that hasn’t been labeled and categorized yet.

Ideas begin in someone’s head. With research, debate, and engineering, they become products. Products have labels and categories that facilitate communication, search and commerce. The challenge for idea search on future problems is that the opposite occurs: products are not yet ideas and the problems they solve may not have been defined yet. If I may, Donald Rumsfeld nailed the problem with this famous quote:

“There are known knowns. These are things we know that we know. There are known unknowns. That is to say, there are things that we know we don’t know. But there are also unknown unknowns. There are things we don’t know we don’t know.”

And if it’s an unknown unknown, it certainly hasn’t been labeled yet so how do you search for it? Our CEO Walt Diggelmann used to say it this way, “ai-one gives you an answer to a question, you did not know that you have to ask….!

Innovators work in this whitespace.

If you could build and combine different intelligent (idea) agents for problems as easily as you test different combinations of words in a search box, you could drive an interactive and spontaneous exploration of ideas. In some ways this is the gift of our intelligence. New ideas and innovation are in great part combinatorial, collaborative and stimulated by bringing together seemingly unrelated knowledge to find new solutions.

Instead of pumping everything into your brain (or an AI) and hoping the ideas pop out, we want to give you the ability to mix combinations of brains, add goals and constraints and see what you can create. Matt Ridley termed this “ideas having sex”. This is our goal for Topic-Mapper (not the sex part).

So what better place to apply this approach than to the exploration of space? NASA already created a “taxonomy of ideas” for the missions of the next few decades. In my next blog I’ll describe the demo we’re working on for the grandest of the grand challenges, human space exploration.

Tom

AI, AGI, ASI, Deep Learning, Intelligent Machines.. Should you worry?

Saturday, January 17th, 2015

If the real life Tony Stark and technology golden boy, Elon Musk, is worried that AI is an existential threat to humanity, are we doomed? Can mere mortals do anything about this when the issue is cloaked in dozens of buzzwords and the primary voices on the subject are evangelists with 180 IQs from Singularity University? Fortunately, you can get smart and challenge them without a degree in AI from MIT.

There are good books on the subject. I like James Barrat’s Our Final Invention and while alarmist, it is thorough and provides a guide to a number of resources from both sides of the argument. One of those was the Machine Intelligence Research Institute (MIRI) founded by Eliezer Yudkowsky. This book was recommended on the MIRI website and is a good primer on the subject.

Smarter Than Us by Stuart ArmstrongSmarter Than Us – The Rise of Machine Intelligence by Stuart Armstrong can also be downloaded at iTunes.

“It will sharpen your focus to see AI from a different view. The book does not provide a manual for Friendly AI, but its shows the problems and it points to the 3 critical things needed. We are evaluating the best way for ai-one to participate in the years ahead.” Walt Diggelmann, CEO ai-one.

In Chapter 11 Armstrong recommends we take an active role in the future development and deployment of AI, AGI and ASI. The developments are coming; the challenge is to make sure AI plays a positive role for everyone. A short summary:

“That’s Where You Come In . . .

There are three things needed—three little things that will make an AI future bright and full of meaning and joy, rather than dark, dismal, and empty. They are research, funds, and awareness.

Research is the most obvious.
A tremendous amount of good research has been accomplished by a very small number of people over the course of the last few years—but so much more remains to be done. And every step we take toward safe AI highlights just how long the road will be and how much more we need to know, to analyze, to test, and to implement.

Moreover, it’s a race. Plans for safe AI must be developed before the first dangerous AI is created.
The software industry is worth many billions of dollars, and much effort (and government/defense money) is being devoted to new AI technologies. Plans to slow down this rate of development seem unrealistic. So we have to race toward the distant destination of safe AI and get there fast, outrunning the progress of the computer industry.

Funds are the magical ingredient that will make all of this needed research.
In applied philosophy, ethics, AI itself, and implementing all these results—a reality. Consider donating to the Machine Intelligence Research Institute (MIRI), the Future of Humanity Institute (FHI), or the Center for the Study of Existential Risk (CSER). These organizations are focused on the right research problems. Additional researchers are ready for hire. Projects are sitting on the drawing board. All they lack is the necessary funding. How long can we afford to postpone these research efforts before time runs out? “

About Stuart: “After a misspent youth doing mathematical and medical research, Stuart Armstrong was blown away by the idea that people would actually pay him to work on the most important problems facing humanity. He hasn’t looked back since, and has been focusing mainly on existential risk, anthropic probability, AI, decision theory, moral uncertainty, and long-term space exploration. He also walks the dog a lot, and was recently involved in the coproduction of the strange intelligent agent that is a human baby.”

Since ai-one is a part of this industry and one of the many companies moving the field forward, there will be many more posts on the different issues confronting AI. We will try to keep you updated and hope you’ll join the conversation on Google+, Facebook, Twitter or LinkedIn. AI is already pervasive and developments toward AGI can be a force for tremendous good. Do we think you should worry? Yes, we think it’s better to lose some sleep now so we don’t lose more than that later.

Tom

(originally posted on www.analyst-toolbox.com)

ai-one and the Machine Intelligence Landscape

Monday, January 12th, 2015

In the sensationally titled Forbes post, Tech 2015: Deep Learning And Machine Intelligence Will Eat The World, author Anthony Wing Kosner surveys the impact of deep learning technology in 2015. This is nothing new for those in the field of AI. His post reflects the recent increase in coverage artificial intelligence (AI) technologies and companies are getting in business and mainstream media. As a core technology vendor in AI for over ten years, it’s a welcome change in perspective and attitude.

We are pleased to see ai-one correctly positioned as a core technology vendor in the Machine Intelligence Landscape chart featured in the article. The chart, created by Shivon Zilis, investor at BloombergBETA, is well done and should be incorporated into the research of anyone seriously tracking this space.

Especially significant is Zilis’ focus on “companies that will change the world of work” since these are companies applying AI technologies to innovation and productivity challenges across the public and private sectors. The resulting solutions will provide real value through the combination of domain expertise (experts and data) and innovative application development.

This investment thesis is supported by the work of Erik Brynjolfsson and Andrew McAfee in their book “The Second Machine Age”, a thorough discussion of value creation (and disruption) by the forces of innovation that is digital, exponential and combinatorial. The impact of these technologies will change the economics of every industry over years if not decades to come. Progress and returns will be uneven in their impact on industry, regional and demographic sectors. While deep learning is early in Gartner’s Hype Cycle, it is clear that the market value of machine learning companies and data science talent are climbing fast.

This need for data scientists is growing but the business impact of AI may be limited in the near future by the lack of traditional developers who can apply them. Jeff Hawkins of Numenta has spoken out on this issue and we agree. It is a fundamentally different way to create an application for “ordinary humans” and until the “killer app” Hawkin’s speaks about is created, it will be hard to attract enough developers to invest time learning new AI tools. As the chart shows, there are many technologies competing for their time. Developers can’t build applications with buzzwords and one size fits all APIs or collections of open source algorithms. Technology vendors have a lot of work to do in this respect.

Returning to Kosner’s post, what exactly is deep learning and how is it different from machine learning/artificial intelligence? According to Wikipedia,

Deep learning is a class of machine learning training algorithms that use many layers of nonlinear processing units for feature extraction and transformation. The algorithms may be supervised or unsupervised and applications include pattern recognition and statistical classification.

  • are based on the (unsupervised) learning of multiple levels of features or representations of the data. Higher level features are derived from lower level features to form a hierarchical representation.
  • are part of the broader machine learning field of learning representations of data.
  • learn multiple levels of representations that correspond to different levels of abstraction; the levels form a hierarchy of concepts.
  • form a new field with the goal of moving toward artificial intelligence. The different levels of representation help make sense of data such as images, sounds and texts.

These definitions have in common (1) multiple layers of nonlinear processing units and (2) the supervised or unsupervised learning of feature representations in each layer, with the layers forming a hierarchy from low-level to high-level features.

While in the 4th bullet this is termed a new field moving toward artificial intelligence, it is generally considered to be part of the larger field of AI already. Deep learning and machine intelligence is not the same as human intelligence. Artificial intelligence in this definition above and in the popular press usually refers to Artificial General Intelligence (AGI). AGI and the next evolution, Artificial Super Intelligence (ASI) are the forms of AI that Stephen Hawking and Elon Musk are worried about.

This is powerful stuff no question, but as an investor, user or application developer in 2015 look for the right combination of technology, data, domain expertise, and application talent applied to a compelling (valuable) problem in order to create a disruptive innovation (value). This is where the money is over the new five years and this is our focus at ai-one.

Tom

Context, Graphs and the Future of Computing

Friday, June 20th, 2014

Robert Scoble and Shel Israel’s latest book, Age of Context, is a survey of the contributions across the globe to the forces influencing technology and our lives today.  The five forces are mobile, social media, data, sensors and location.  Scoble calls these the five forces of context and harnessed, they are the future of computing.

Pete Mortensen also addressed context in his brilliant May 2013 article in Fast Company “The Future of Technology Isn’t Mobile, It’s Contextual.”   So why is context so important (and difficult)?  First, context is fundamental to our ability to understand the text we’re reading and the world we live in.  In semantics, there is the meaning of the words in the sentence, the context of the page, chapter, book and prior works or conversations, but also the context the reader’s education and experience add to the understanding.  As a computing problem, this is the domain of text analytics.

Second, if you broaden the discussion as Mortensen does to personal intelligent agents (Siri, Google Now), the bigger challenge is complexity.  Inability to understand context has always made it difficult for computers and people to work together.  People and the language we use to describe our world is complex, not mathematical, You can’t be reduced to a formula or rule set, no matter how much data is crunched. Mortensen argues (and we agree) that the five forces are finally giving computers the foundational information needed to understand “your context” and that context is expressed in four data graphs.  These data graphs are

  • Social (friends, family and colleagues),
  • Interest (likes & purchases),
  • Behavior (what you do & where) and
  • Personal (beliefs & values).

While Google Glass might be the poster child of a contextual UX, ai-one has the technology to power these experiences by extracting Mortensen’s graphs from the volumes of complex data generated by each of us through our use of digital devices and interaction with increasing numbers of sensors known as the Internet of Things (IoT).  The Nathan API is already being used to process and store unstructured text and deliver a representation of that knowledge in the form of a graph.  This approach is being used today in our BrainDocs product for eDiscovery and compliance.

Age of Context by Scoble and IsraelIn Age of Context, ai-one is pleased to be recognized as a new technology addressing the demands of these new types of data.  The data and the applications that use them are no longer stored in silos where only domain experts can access them.  With Nathan the data space learns from the content, delivering a more relevant contextual response to applications in real time with user interfaces that are multi-sensory, human and intuitive.

We provide developers this new capability in a RESTful API. In addition to extracting graphs from user data, they can build biologically inspired intelligent agents they can train and embed in intelligent architectures.   Our new Nathan is enriched with NLP in a new Python middleware that allows us to reach more OEM developers.  Running in the cloud and integrated with big data sources and ecosystems of existing APIs and applications, developers can quickly create and test new applications or add intelligence to old ones.

For end users, the Analyst Toolbox (BrainBrowser and BrainDocs) demonstrates the value proposition of our new form of artificial intelligence and shows developers how Nathan can be used with other technologies to solve language problems.  While we will continue to roll out new features to this SaaS offering for researchers, marketers, government and compliance professionals, the APIs driving the applications will be available to developers.

Mortensen closes, “Within a decade, contextual computing will be the dominant paradigm in technology.”  But how?  That’s where ai-one delivers.  In coming posts we will discuss some of the intelligent architectures built with the Nathan API.

ai-one Contributes to ETH Publication on Knowledge Representation

Tuesday, June 3rd, 2014

We are pleased to announce the availability of the following publication from prestigious ETH University in Zurich.  This book will be a valuable resource to developers, data scientists, search and knowledge management educators and practitioners trying to deal with the massive amounts of information in both public and private data sources.  We are proud to have our contribution to the field acknowledged in this way.

Knowledge Organization and Representation with Digital Technologies

http://www.degruyter.com/view/product/205460  |  ISBN: 978-3-11-031281-2

ai-one was invited to contribute as co-author to a chapter in this technical book.

ETH Publication- Knowledge RepresentationIn the anthology readers will find very different conceptual and technological methods for modeling and digital representation of knowledge for knowledge organizations (universities, research institutes and educational institutions), and companies based on practical examples presented in a synopsis. Both basic models of the organization of knowledge and technical implementations are discussed including their limitations and difficulties in practice.  In particular the areas of knowledge representation and the semantic web are explored. Best practice examples and successful application scenarios provide the reader with a knowledge repository and a guide for the implementation of their own projects. The following topics are covered in the articles:

  •  hypertext-based knowledge management
  • digital optimization of the proven analog technology of the list box
  • innovative knowledge organization using social media
  • search process visualization for digital libraries
  • semantic events and visualization of knowledge
  • ontological mind maps and knowledge maps
  • intelligent semantic knowledge processing systems
  • fundamentals of computer-based knowledge organization and integration

The book also includes coding medical diagnoses, contributions to the automated creation of records management models, business fundamentals of computer-aided knowledge organization and integration, the concept of mega regions to support of search processes and the management of print publications in libraries.

Available in German only at this time.

Wissensorganisation und -repräsentation mit digitalen Technologien

http://www.degruyter.com/view/product/205460  |  ISBN: 978-3-11-031281-2

ai-one war eigeladen worden, als CO-Autor ein Kapitel in diesem Sachbuch beizusteuern.

Im Sammelband werden die sehr unterschiedlichen konzeptionellen und technologischen Verfahren zur Modellierung und digitalen Repräsentation von Wissen in Wissensorganisationen (Hochschulen, Forschungseinrichtungen und Bildungsinstitutionen) sowie in Unternehmen anhand von  praxisorientierten Beispielen in einer Zusammenschau vorgestellt. Dabei werden sowohl grundlegende Modelle der Organisation von Wissen als auch technische Umsetzungsmöglichkeiten sowie deren Grenzen und Schwierigkeiten in der Praxis insbesondere in den Bereichen der Wissensrepräsentation und des Semantic Web ausgelotet. Good practice Beispiele und erfolgreiche Anwendungsszenarien aus der Praxis bieten dem Leser einen Wissensspeicher sowie eine Anleitung zur Realisierung von eigenen Vorhaben. Folgende Themenfelder werden in den Beiträgen behandelt:

  • Hypertextbasiertes Wissensmanagement
  • digitale Optimierung der erprobten analogen Technologie des Zettelkastens
  • innovative Wissensorganisation mittels Social Media
  • Suchprozessvisualisierung für Digitale Bibliotheken
  • semantische Event- und Wissensvisualisierung
  • ontologische Mindmaps und Wissenslandkarten
  • intelligente semantische Wissensverarbeitungssysteme

sowie Grundlagen der computergestützten Wissensorganisation und -integration, das Konzept von Mega-Regionen zur Unterstützung von Suchprozessen und zum Management von Printpublikationen in Bibliotheken, automatisierte Kodierung medizinischer Diagnosen sowie Beiträge zum Records Management zur Modellbildung und Bearbeitung von Geschäftsprozessen.

Big Data Solutions: Intelligent Agents Find Meaning of Text

Friday, January 18th, 2013

 

ai-BrainDocs AgentWhat if your computer could find ideas in documents? Building on the idea of fingerprinting documents, ai-one helped develop ai-BrainDocs – a tool to mine large sets of documents to find ideas using intelligent agents. This solves a big problem for knowledge workers: How to find ideas in documents that are missed by traditional keyword search tools (such as Google, Lucine, Solr, FAST, etc.).

Customers Struggle with Unstructured Text

Almost every organization struggles to find value in “big data” – especially ideas buried within unstructured text. Often a very limited set of vocabulary can be used to express very different ideas. Lawyers are particularly talented at this: They can use 100 unique words to express thousands of ideas by simply changing the ordering and frequencies of the words.

Lawyers are not the only ones that need to find ideas inside documents. Other use cases include finding and classifying complaints, identifying concepts within social media feeds such as Twitter or Facebook and mining PubMed find related research articles. Recently, we have had several healthcare companies contact us to mine electronic health records (EHR) data to find information that is buried within doctors notes so they can predict adverse reactions, find co-morbidity risks and detect fraud.

The common denominator for all these uses cases is simple: How to find “what matters most” in documents? They need a way to find these ideas fast enough to keep pace with the growth in documents. Given that information is growing at almost 20% per year – this means that a very big problem now will be enormous next year.

Problems with Current Approaches

We’ve heard numerous stories from customers who were frustrated at the cost, complexity and expertise required to implement solutions to enable machines to read and understand the meaning of free-form text. Often these solutions use latent semantic indexing (LSI) and latent Dirichlet allocation (LDA). In one case, a customer spent more than two years trying to combine LSI with a Microsoft FAST Enterprise search appliance running on SharePoint. It failed because they were searching a high-volume of legal documents with very low variability. They were searching legal contracts to find paragraphs that included a very specific legal concept that could be expressed with many different combinations of words. Keyword search failed because the legal concept used commonly used words. LSI and LDA failed because the systems required a very large training set — often involving hundreds of documents. Even after reducing the specificity requirements, LSI and LDA still failed because they could not find the legal ideas at the paragraph level.

Inspiration

We found inspiration in the complaints we heard from customers: What if we could build an “intelligent agent” that could read documents like a person? We thought of the agent as an entry-level staff person who could be taught with a few examples then highlight paragraphs that were similar to (but not exactly like) the teaching examples.

Solution: Building Intelligent Agents

For several months, we have been developing prototypes of intelligent agents to mine unstructured text to find meaning. We built a Java application that combine ai-one’s machine learning API with natural language processing (OpenNLP) and NoSQL databases (MongoDB). Our approach generates an “ai-Fingerprint” that is a representational model of a document using keywords and association words. The “ai-Fingerprint” is similar to a graph G[V,E] where G is the knowledge representation, V (vertices) are keywords, and E (edges) are associations. This can also be thought of as a topic model.

ai-FingerprintThe ai-Fingerprint can be generated for almost any size text – from sentences to entire libraries of documents. As you might expect, the “intelligence” (or richness) of the ai-Fingerprint is proportional to the size of text it represents. Very sparse text (such as a tweet) has very little meaning. Large texts, such as legal documents, are very rich. This approach to topic modelling is precise — even without training or using external ontologies.

[NOTE: We are experimenting with using ontologies (such as OWL and RDF) as a way to enrich ai-Fingerprints with more intelligence. We are eager to find customers who want to build prototypes using this approach.]

The Secret Sauce

The magic is that ai-one’s API automatically detects keywords and associations – so it learns faster, with fewer documents and provides a more precise solution than mainstream machine learning methods using latent semantic analysis. Moreover, using ai-one’s approach makes it relatively easy for almost any developer to build intelligent agents.

How to Build Intelligent Agents?

To build an intelligent agent, we first had to consider how a human reads and understands a document.

The Human Perspective

Human are very good at detecting ideas – regardless of the words used to express them. As mentioned above, lawyers can express dozens of completely different legal concepts with a vocabulary of just a few hundred words. Humans can recognize the subtle differences of two paragraphs by how a lawyer uses words – both in meaning (semantics) and structure (syntax). Part of the cleverness of a lawyer is finding ways to combine as few words as possible to express a very precise idea to accomplish a specific legal or business objective. In legal documents, each new idea is almost always expressed in a paragraph. So two paragraphs might have the exact same words but express completely different ideas.

To find these ideas, a person (or computer) must detect the patterns of word use – similar to the finding a pattern in a signal. For example, as a child I knew I was in trouble when my mother called me by my first and last name – the combination of these words created a “signal” that was different than when she just used my first name. Similarly, a legal concept has a different meaning if two words occur together, such as “written consent” than if it only uses the word “consent.”

The (Conventional) Machine Learning Perspective

It’s almost impossible to program a computer to find such “faint signals” within a large number of documents. To do so would require a computer to be programmed to find all possible combinations of words for a given idea to search and match.

Machine learning technologies enable computers to identify features within the data to detect patterns. The computer “learns” by recognizing the combinations of features as patterns.

[There are many forms of machine learning – so I will keep focused only on those related to our text analytics problem.]

Natural Language Processing

One of the most important forms of machine learning for text analytics is natural language processing (NLP). NLP tools are very good at codifying the rules of language for computers to detect linguistic features – such as parts of speech, named entities, etc.

However (at the time of this writing), most NLP systems can’t detect patterns unless they are explicitly programmed or trained to do so. Linguistic patterns are very domain specific. The language used in medicine is different than what is used in law, etc. Thus, NLP is not easily generalized. NLP only works in specific situations where there is predictable syntax, semantics and context. IBM Watson can play Jeopardy! but has had tremendous problems finding commercial applications in marketing or medical records processing. Very few organizations have the budget or expertise to train NLP systems. They are left to either buy an off-the-shelf solution (such as StoredIQ ) or hire a team of PhDs to modify one of the open-source NLP tools. Good luck.

Latent Analysis Techniques

Tools such as latent semantic analysis (LSA), latent semantic indexing (LSI) and latent Dirichlet allocation (LDA) are all capable of detecting patterns within language. However, they require tremendous expertise to implement and often require large numbers of training documents. LSA and LSI are computationally expensive because they must recalculate the relationships between features each time they are given something new to learn. Thus, learning the meaning of the 1,001th document requires a calculation across the 1,000 previously learned documents. LSA uses a statistical approach called single variable decomposition to isolate keywords. Unlike LSA, ai-one’s technology also detects the association words that give a keyword context.

Similar to our ai-Fingerprint approach, LDA uses a graphical model for topic discovery. However, it takes tremendous skill to develop applications using LDA. Even when implemented, it requires the user to make informed guesses about the nature of the text. Unlike LDA, ai-one’s technology can be learned in a few hours. It requires no supervision or human interaction. It simply detects the inherent semantic value of text – regardless of language.

Our First Intelligent Agent Prototype: ai-BrainDocs

It took our team about a month to build the initial version of ai-BrainDocs. Our team used ai-one’s keyword and association commands to generate a graph for each document. This graph goes into MongoDB as a JSON object that represents the knowledge (content) of each document.
Next we created an easy way to build intelligent agents. We simply provide the API with examples of concepts we want to find. This training set can be very short. For one type of legal contracts, it only took 4 examples of text for the intelligent agent to achieve 90% accuracy in finding similar concepts.

Unlike solutions that use LSI, LDA and other technologies, the intelligent agents in ai-BrainDocs finds ideas at the paragraph level. This is a huge advantage when looking at large documents – such as medical research or SEC filings.

Next we built an interface that allows the end-user to control the intelligent agents by setting thresholds for sensitivity and determining how many paragraphs to scan at a time.

Our first customers are now testing ai-BrainDocs – and so far they love it. We expect to learn a lot as more people use the tool for different purposes. We are looking forward to developing ways for intelligent agents to interact – just like people – by comparing what they find within documents. We are finding that it is best for each agent to specialize in a specific subject. So finding ways for agents to compare their results using Boolean operators enables them to find similarities and differences between documents.

One thing is clear: Intelligent agents are ideal for mining unstructured text to find small ideas hidden in big data.

We look forward to reporting more on our work with ai-BrainDocs soon.

Posted by: Olin Hyde