Andrew McCallum

Contact Info
Bio & Affiliations
Vita
Teaching
Publications
Research & Projects
Code & Data
Students & other collab's
Activities & Events
Personal

Links:
UMass ML Seminar


Selected Publications by Topic (since 2001)

[ by topic | by date ]

Shortcuts:

  1. Social Network Analysis and Clustering
  2. Coreference and Object Correspondence
  3. Efficient Inference and Learning in Graphical Models
  4. Joint Inference for NLP
  5. Information Extraction
  6. Semi-supervised Learning, Active Learning, Interactive Learning
  7. Bioinformatics
  8. Computer Vision, Networking, etc.
  9. Text Classification

Social Network Analysis and Clustering

  • Group and Topic Discovery from Relations and Text. Xuerui Wang, Natasha Mohanty and Andrew McCallum. KDD Workshop on Link Discovery: Issues, Approaches and Applications (LinkKDD) 2005. (Social network analysis that simultaneously discovers groups of entities and also clusters attributes of their relations, such that clustering in each dimension informs the other. Applied to the voting records and corresponding text of resolutions from the U.S. Senate and the U.N., showing that incorporating the votes results in more salient topic clusters, and that different groupings of legislators emerge from different topics.)
  • Topic and Role Discovery in Social Networks. Andrew McCallum, Andres Corrada-Emmanuel and Xuerui Wang. IJCAI, 2005. (Conference paper version of tech report by same authors in 2004 below. Also includes new results with Role-Author-Recipient-Topic model. Discover roles by social network analysis with a Bayesian network that models both links and text messages exchanged on those links. Experiments with Enron email and academic email.)
  • The Author-Recipient-Topic Model for Topic and Role Discovery in Social Networks: Experiments with Enron and Academic Email. Andrew McCallum, Andres Corrada-Emmanuel, Xuerui Wang. Technical Report UM-CS-2004-096, 2004. (Also presented the NIPS'04 Workshop on " Structured Data and Representations in Probabilistic Models for Categorization") (Social network analysis that not only models links between people, but the word content of the messages exchanged between them. Discovers salient topics guided by the sender-recipient structure in data, and provides improved ability to measure role-similarity between people. A generative model in the style of Latent Dirichlet Allocation.)
  • Disambiguating Web Appearances of People in a Social Network. Ron Bekkerman and Andrew McCallum. WWW Conference, 2005. (Find homepages and other Web pages mentioning particular people. Do a better job by leveraging a collection of related people.)
  • Multi-Way Distributional Clustering via Pairwise Interactions. Ron Bekkerman, Ran El-Yaniv and Andrew McCallum. ICML 2005. (Distributional clustering in multiple feature dimensions or modalities at once--made efficient by a factored representation as used in graphical models, and by a combination of top-down and bottom-up clustering. Results on email clustering, and new best results on 20 Newsgroups.)
  • Extracting Social Networks and Contact Information from Email and the Web. Aron Culotta, Ron Bekkerman and Andrew McCallum. Conference on Email and Spam (CEAS) 2004. (Describes an early version of an end-to-end system that automatically populates your email address book with a large social network, including "friends-of-friends," and information about people's expertise.)
  • An Exploration of Entity Models, Collective Classification and Relation Description. Hema Raghavan, James Allan and Andrew McCallum. KDD Workshop on Link Analysis and Group Detection, August 2004. (Part of a student synthesis project: includes an application of RMNs to classifying people in newswire.)

Coreference and Object Correspondence

Efficient Inference and Learning in Graphical Models

  • Fast, Piecewise Training for Discriminative Finite-state and Parsing Models. Charles Sutton and Andrew McCallum. Center for Intelligent Information Retrieval Technical Report IR-403. 2005. (Further results with "piecewise training", a method also described in a UAI'05 paper below.)
  • Piecewise Training for Undirected Models. Charles Sutton and Andrew McCallum. UAI, 2005. (Efficiently train a large graphical model in separately normalized pieces, and amazingly often obtain higher accuracy than without this approximation. This paper also shows that this piecewise objective is a lower bound on the exact likelihood, and gives results with three different graphical model structures.)
  • Constrained Kronecker Deltas for Fast Approximate Inference and Estimation. Chris Pal, Charles Sutton, Andrew McCallum. Submitted to UAI, 2005. (Sometimes the graph of the graphical model is not large and complex, but the cardinality of the variables is large. This paper describes a new and generalized method for beam search on graphical models, showing positive experimental results for both inference and training. Experiments on NetTalk.)
  • Piecewise Training with Parameter Independence Diagrams: Comparing Globally- and Locally-trained Linear-chain CRFs. Andrew McCallum and Charles Sutton. Center for Intelligent Information Retrieval, University of Massachusetts  Technical Report IR-383. 2004. (Also presented at NIPS 2004 Workshop on Learning with Structured Outputs.) (Large undirected graphical models are expensive to train because they require global inference to calculate the gradient of the parameters. We describe a new method for fast training in locally-normalized pieces. Amazingly the resulting models also give higher accuracy than their globally-trained counterparts.)

 

Joint Inference for NLP

Information Extraction

  • Feature Bagging: Preventing Weight Undertraining in Structured Discriminative Learning. Charles Sutton, Michael Sindelar, and Andrew McCallum. Center for Intelligent Information Retrieval, University of Massachusetts Technical Report IR-402. 2005. (Avoid a common under-appreciated problem: overly heavy reliance on a few discriminative features which may not be as reliably present in the testing data. Discusses four methods of separate training and combination, and presents statistically-significant improvements---including new best results on CoNLL-2000 NP Chunking.)
  • Composition of Conditional Random Fields for Transfer Learning. Charles Sutton and Andrew McCallum. Proceedings of Human Language Technologies / Emprical Methods in Natural Language Processing (HLT/EMNLP) 2005. (Improve information extraction from email data by using the output of another extractor that was trained on large quantities of newswire. Improve accuracy further by using joint inference between the two tasks---so that the final target task can actually affect the output of the intermediate task.)
  • Reducing Labeling Effort for Structured Prediction Tasks. Aron Culotta and Andrew McCallum. AAAI, 2005. (A step toward bringing trainable information extraction to the masses! Make it easier for end-users to train IE by providing multiple-choice labeling options, and propagating any constraints their labels provide on portions of the record-labeling task.)
  • Extracting Social Networks and Contact Information from Email and the Web. Aron Culotta, Ron Bekkerman and Andrew McCallum. Conference on Email and Spam (CEAS) 2004. (Describes an early version of an end-to-end system that automatically populates your email address book with a large social network, including "friends-of-friends," and information about people's expertise.)
  • Accurate Information Extraction from Research Papers using Conditional Random Fields. Fuchun Peng and Andrew McCallum. Proceedings of Human Language Technology Conference and North American Chapter of the Association for Computational Linguistics (HLT-NAACL), 2004. (Applies CRFs to extraction from research paper headers and reference sections, to obtain current best-in-the-world accuracy. Also compares some simple regularization methods.)
  • Chinese Segmentation and New Word Detection using Conditional Random Fields. Fuchun Peng, Fangfang Feng, and Andrew McCallum. Proceedings of The 20th International Conference on Computational Linguistics (COLING 2004) , August 23-27, 2004, Geneva, Switzerland. (State-of-the art Chinese word segmentation with CRFs, with rich features and many lexicons; also using confidence estimation to add new words to the lexicon.)
  • Confidence Estimation for Information Extraction. Aron Culotta and Andrew McCallum. Proceedings of Human Language Technology Conference and North American Chapter of the Association for Computational Linguistics (HLT-NAACL), 2004, short paper. (How to provide not only an answer, but a formally-justified confidence in that answer--using contrained forward-backward.)
  • Rapid Development of Hindi Named Entity Recognition Using Conditional Random Fields and Feature Induction. Wei Li and Andrew McCallum. ACM Transactions on Asian Language Information Processing, 2003. (How we developed a named entity recognition system for Hindi in just a few weeks.)
  • Efficiently Inducing Features of Conditional Random Fields. Andrew McCallum. Conference on Uncertainty in Artificial Intelligence (UAI), 2003. (CRFs give you the great power to include the kitchen sink worth of features. How do you decide which ones to include to avoid over-fitting and running out of memory? A formal, information-theoretic approach, with carefully-chosen approximations to make it efficient with millions of candidate features. This technique key to success in Hindi above, as well as work by Pereira's group at UPenn)
  • Early Results for Named Entity Recognition with Conditional Random Fields, Feature Induction and Web-Enhanced Lexicons. Andrew McCallum and Wei Li. Seventh Conference on Natural Language Learning (CoNLL), 2003. (This is the first publication about named entity extraction with CRFs.)
  • Table Extraction Using Conditional Random Fields. David Pinto, Andrew McCallum, Xing Wei and W. Bruce Croft. Proceedings of the ACM SIGIR, 2003. (Application of CRFs to finding tables in government reports. Uses both language and layout features.)
  • Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data. John Lafferty, Andrew McCallum and Fernando Pereira. ICML-2001. (A conditionally-trained model for sequences and other structured data, with global normalization. The original CRF paper. Don't bother reading the section on parameter estimation---use BFGS instead of Iterative Scaling; e.g. see [McCallum UAI 2003].)

Semi-supervised Learning, Active Learning, Interactive Learning

  • Semi-Supervised Sequence Modeling with Syntactic Topic Models. Wei Li and Andrew McCallum. AAAI, 2005. (Learn a low-dimensional manifold from large quantities of unlabled text data, then use components of the manifold as additional features when training a linear-chain CRF with limited labeled data. The manifold is learned using HMM-LDA [Griffiths, Steyvers, Blei, Tenenbaum 2004], an unsupervised model with special structure suitable for sequences and topics. Experimens with English part-of-speech tagging and Chinese word segmentation.)
  • Reducing Labeling Effort for Structured Prediction Tasks. Aron Culotta and Andrew McCallum. AAAI, 2005. (A step toward bringing trainable information extraction to the masses! Make it easier for end-users to train IE by providing multiple-choice labeling options, and propagating any constraints their labels provide on portions of the record-labeling task.)
  • Interactive Information Extraction with Constrained Conditional Random Fields. Trausti Kristjannson, Aron Culotta, Paul Viola and Andrew McCallum. Nineteenth National Conference on Artificial Intelligence (AAAI 2004). San Jose, CA. (Winner of Honorable Mention Award.) (Help a user interactively correct the results of extraction by providing uncertainty cues in the UI, and by using constrained Viterbi to automatically make additional corrections after the first human correction.)
  • A Note on Semi-supervised Learning using Markov Random Fields. Wei Li and Andrew McCallum. Technical Note, February 3, 2004. (A general framework for semi-supervised learning in Conditional Random Fields, with a focus on learning the distance metric between instances. Experimental results with collective classification of documents.)
  • Learning with Scope, with Application to Information Extraction and Classification. David Blei, Drew Bagnell and Andrew McCallum. Conference on Uncertainty in Artificial Intelligence (UAI), 2002. (Learn highly reliable formatting-based extractors on the fly at test time, using graphical models and variational inference. Describes both generative and conditional versions of the model.)
  • Toward Optimal Active Learning through Sampling Estimation of Error Reduction. Nick Roy and Andrew McCallum. ICML-2001. (A leave-one-out approach to active learning.)

Bioinformatics

  • Gene Prediction with Conditional Random Fields. Aron Culotta, David Kulp, and Andrew McCallum. Technical Report UM-CS-2005-028, University of Massachusetts, Amherst, April 2005. (Use finite-state CRFs to locate introns and exons in DNA sequences. Shows the advantages of CRFs' ability to straightforwardly incorporate homology evidence from protein databases.)

Computer Vision, Networking, etc

Text Classification