Microsoft Research Focus 22 | Week of August 14, 2023

Welcome to Research Focus, a series of blog posts that highlights notable publications, events, code/datasets, new hires and other milestones from across the research community at Microsoft.


HyWay: Enabling Mingling in the Hybrid World

As remote work has grown in recent years, videoconferencing tools like Teams help support structured meetings with a scheduled time, a specific agenda, and a set of invitees. For unstructured interactions, like hallway conversations or water cooler chats, newer “spatial” tools such as Gather and SpatialChat arose. But these are confined to users in virtual-only settings.

Many organizations and events now offer a mix of in-person and remote attendance, or “hybrid” work. This creates a new challenge for remote workers or conference goers who want to stay visible to, and mingle with, their colleagues attending in person. Existing tools fall short either in not supporting unstructured interactions, or in not supporting hybrid settings, or both.

In a recent paper: HyWay: Enabling Mingling in the Hybrid World, researchers from Microsoft present a system to support informal interactions among physical and virtual participants. HyWay lets remote users see and hear, and be seen and heard by, in-person users using large displays placed in hallways or “physical zones,” with the ability to move between the zones using a map-based interface. In-person users, who aren’t tethered to a device or app, can simply walk from one zone to another.

The paper includes user survey findings from multiple deployments.

Microsoft Research Podcast

AI Frontiers: The future of causal reasoning with Emre Kiciman and Amit Sharma

Emre Kiciman and Amit Sharma discuss their paper “Causal Reasoning and Large Language Models: Opening a New Frontier for Causality” and how it examines the causal capabilities of large language models (LLMs) and their implications.


Auto-Tables: Synthesizing Multi-Step Transformations to Relationalize Tables without Using Examples

Relational tables, where each row corresponds to an entity and each column corresponds to an attribute, are the standard tables in relational databases. However, a survey of real spreadsheet-tables and web-tables shows that over 30% of tables “in the wild” do not conform to the relational standard. This means complex table-restructuring transformations are needed before these tables can be queried using SQL-based analytics tools. Unfortunately, the required transformations are non-trivial to program, creating a substantial pain point for technical and non-technical users alike, as evidenced by large numbers of forum questions in places like StackOverflow and Excel/Power BI/Tableau forums.

In a new paper: Auto-Tables: Synthesizing Multi-Step Transformations to Relationalize Tables without Using Examples, researchers from Microsoft present a system that can automatically synthesize pipelines with multi-step transformations (in Python or other languages). This system transforms non-relational tables into standard relational forms for downstream analytics, obviating the need for users to manually program transformations.

The research includes an extensive benchmark for this new task, compiled by collecting 244 real test cases from publicly available spreadsheets and online forums. The accompanying evaluation suggests that Auto-Tables can successfully synthesize transformations for over 70% of test cases at interactive speeds, without requiring any input from users, making this an effective tool for both technical and non-technical users to prepare data for analytics.


Learning to Retrieve In-Context Examples for Large Language Models

In-context learning is an emerging paradigm that allows large language models (LLMs) to perform tasks with few-shot examples, without requiring any updates to the model parameters. However, the effectiveness of in-context learning is heavily reliant on the quality of the selected examples.

In a new paper: Learning to Retrieve In-Context Examples for Large Language Models, researchers from Microsoft propose a novel framework to iteratively train dense retrievers that can identify high-quality in-context examples for LLMs. This framework initially trains a reward model based on LLM feedback to evaluate the quality of candidate examples, followed by knowledge distillation to train a bi-encoder-based dense retriever. Experiments on a suite of 30 tasks demonstrate that the framework significantly enhances in-context learning performance. The research also demonstrates the generalization ability of the framework to unseen tasks during training. An in-depth analysis reveals that the model improves performance by retrieving examples with similar patterns, and the gains are consistent across LLMs of varying sizes.


End-to-End Word-Level Pronunciation Assessment with MASK Pre-training

The Computer-Aided Pronunciation Training (CAPT) system is a powerful tool designed to help people improve their language skills by using advanced AI technologies. Pronunciation assessment is a major challenge in CAPT, especially at the word (phoneme)-level. To obtain word (phoneme)-level scores, current methods usually rely on aligning components to obtain acoustic features of each word (phoneme), which limits the performance of assessment to the accuracy of alignments.

To address this problem, a new paper from researchers at Microsoft: End-to-End Word-Level Pronunciation Assessment with MASK Pre-training, proposes a simple, yet effective method called Masked pre-training for Pronunciation Assessment (MPA). By incorporating a mask-predict strategy, MPA allows the model to train in an end-to-end manner, eliminating the problem of misalignment in word-level assessment. Furthermore, the researchers designed two evaluation strategies to enable the model to conduct assessments in both unsupervised and supervised settings. Experimental results on the SpeechOcean762 dataset demonstrate that MPA could achieve better performance than previous methods, without any explicit alignment. Despite this, MPA still has some limitations, such as requiring more inference time and reference text. Those limitations are expected to be addressed in future work.

Promote your business with us.