Papers made digestable
Our architecture simplifies the obstacle-perception
problem to that of place-dependent change detection. While we use the method with VT&R, it
can be generalized to suit arbitrary path-following applications.
Visual Teach and Repeat 3 (VT&R3), a generalization of stereo VT&R, achieves
long-term autonomous path-following using topometric mapping and localization
from a single rich sensor stream. In this paper, we improve the capabilities of
a LiDAR implementation of VT&R3 to reliably detect and avoid obstacles in
changing environments. Our architecture simplifies the obstacle-perception
problem to that of place-dependent change detection. We then extend the
behaviour of generic sample-based motion planners to better suit the
teach-and-repeat problem structure by introducing a new edge-cost metric paired
with a curvilinear planning space. The resulting planner generates naturally
smooth paths that avoid local obstacles while minimizing lateral path deviation
to best exploit prior terrain knowledge. While we use the method with VT&R, it
can be generalized to suit arbitrary path-following applications. Experimental
results from online run-time analysis, unit testing, and qualitative
experiments on a differential drive robot show the promise of the technique for
reliable long-term autonomous operation in complex unstructured environments.
Authors: Jordy Sehn, Yuchen Wu, Timothy D. Barfoot.
The statistical and design considerations that pertain to
dose optimization are discussed. The sample size savings range from 16.6% to 27.3%,
depending on the design and scenario, with a mean savings of 22.1%.
The traditional more-is-better dose selection paradigm, developed based on
cytotoxic chemotherapeutics, is often problematic When applied to the
development of novel molecularly targeted agents (e.g., kinase inhibitors,
monoclonal antibodies, and antibody-drug conjugates). The US Food and Drug
Administration (FDA) initiated Project Optimus to reform the dose optimization
and dose selection paradigm in oncology drug development and call for more
attention to benefit-risk consideration.
We systematically investigated the operating characteristics of the seamless
phase 2-3 design as a strategy for dose optimization, where in stage 1
(corresponding to phase 2) patients are randomized to multiple doses, with or
without a control; and in stage 2 (corresponding to phase 3) the efficacy of
the selected optimal dose is evaluated with a randomized concurrent control or
historical control. Depending on whether the concurrent control is included and
the type of endpoints used in stages 1 and 2, we describe four types of
seamless phase 2-3 dose-optimization designs, which are suitable for different
clinical settings. The statistical and design considerations that pertain to
dose optimization are discussed. Simulation shows that dose optimization phase
2-3 designs are able to control the familywise type I error rates and yield
appropriate statistical power with substantially smaller sample size than the
conventional approach. The sample size savings range from 16.6% to 27.3%,
depending on the design and scenario, with a mean savings of 22.1%. Due to the
interim dose selection, the phase 2-3 dose-optimization design is logistically
and operationally more challenging, and should be carefully planned and
implemented to ensure trial integrity.
Authors: Liyun Jiang, Ying Yuan.
We significantly improve performance using properties of the posterior
in our active learning scheme and for the definition of the GP prior. In
particular we account for the expected dynamical range of the posterior in
different dimensionalities. We test our model against a number of synthetic and
cosmological examples.
We present the GPry algorithm for fast Bayesian inference of general
(non-Gaussian) posteriors with a moderate number of parameters. GPry does not
need any pre-training, special hardware such as GPUs, and is intended as a
drop-in replacement for traditional Monte Carlo methods for Bayesian inference.
Our algorithm is based on generating a Gaussian Process surrogate model of the
log-posterior, aided by a Support Vector Machine classifier that excludes
extreme or non-finite values. An active learning scheme allows us to reduce the
number of required posterior evaluations by two orders of magnitude compared to
traditional Monte Carlo inference. Our algorithm allows for parallel
evaluations of the posterior at optimal locations, further reducing wall-clock
times. We significantly improve performance using properties of the posterior
in our active learning scheme and for the definition of the GP prior. In
particular we account for the expected dynamical range of the posterior in
different dimensionalities. We test our model against a number of synthetic and
cosmological examples. GPry outperforms traditional Monte Carlo methods when
the evaluation time of the likelihood (or the calculation of theoretical
observables) is of the order of seconds; for evaluation times of over a minute
it can perform inference in days that would take months using traditional
methods. GPry is distributed as an open source Python package (pip install
gpry) and can also be found at https://github.com/jonaselgammal/GPry.
Authors: Jonas El Gammal, Nils Schöneberg, Jesús Torrado, Christian Fidler.
We consider the fundamental scheduling problem of minimizing the sum of
weighted completion times on a single machine in the non-clairvoyant setting. However, to the best of our knowledge, this concept has never been considered
for the total completion time objective in the non-clairvoyant model. This implies
a performance guarantee of $(1+3\sqrt{3})\approx 6.197$ for the deterministic
algorithm and of $\approx 3.032$ for the randomized version.
We consider the fundamental scheduling problem of minimizing the sum of
weighted completion times on a single machine in the non-clairvoyant setting.
While no non-preemptive algorithm is constant competitive, Motwani, Phillips,
and Torng (SODA '93) proved that the simple preemptive round robin procedure is
$2$-competitive and that no better competitive ratio is possible, initiating a
long line of research focused on preemptive algorithms for generalized variants
of the problem. As an alternative model, Shmoys, Wein, and Williamson (FOCS
'91) introduced kill-and-restart schedules, where running jobs may be killed
and restarted from scratch later, and analyzed then for the makespan objective.
However, to the best of our knowledge, this concept has never been considered
for the total completion time objective in the non-clairvoyant model.
We contribute to both models: First we give for any $b > 1$ a tight analysis
for the natural $b$-scaling kill-and-restart strategy for scheduling jobs
without release dates, as well as for a randomized variant of it. This implies
a performance guarantee of $(1+3\sqrt{3})\approx 6.197$ for the deterministic
algorithm and of $\approx 3.032$ for the randomized version. Second, we show
that the preemptive Weighted Shortest Elapsed Time First (WSETF) rule is
$2$-competitive for jobs released in an online fashion over time, matching the
lower bound by Motwani et al. Using this result as well as the competitiveness
of round robin for multiple machines, we prove performance guarantees of
adaptions of the $b$-scaling algorithm to online release dates and unweighted
jobs on identical parallel machines.
Authors: Sven Jäger, Guillaume Sagnol, Daniel Schmidt genannt Waldschmidt, Philipp Warode.
Frozen pretrained models have become a viable alternative to the
pretraining-then-finetuning paradigm for transfer learning. With this work, we hope to
bring greater attention to this promising path of freezing pretrained image
models.
Frozen pretrained models have become a viable alternative to the
pretraining-then-finetuning paradigm for transfer learning. However, with
frozen models there are relatively few parameters available for adapting to
downstream tasks, which is problematic in computer vision where tasks vary
significantly in input/output format and the type of information that is of
value. In this paper, we present a study of frozen pretrained models when
applied to diverse and representative computer vision tasks, including object
detection, semantic segmentation and video action recognition. From this
empirical analysis, our work answers the questions of what pretraining task
fits best with this frozen setting, how to make the frozen setting more
flexible to various downstream tasks, and the effect of larger model sizes. We
additionally examine the upper bound of performance using a giant frozen
pretrained model with 3 billion parameters (SwinV2-G) and find that it reaches
competitive performance on a varied set of major benchmarks with only one
shared frozen base network: 60.0 box mAP and 52.2 mask mAP on COCO object
detection test-dev, 57.6 val mIoU on ADE20K semantic segmentation, and 81.7
top-1 accuracy on Kinetics-400 action recognition. With this work, we hope to
bring greater attention to this promising path of freezing pretrained image
models.
Authors: Yutong Lin, Ze Liu, Zheng Zhang, Han Hu, Nanning Zheng, Stephen Lin, Yue Cao.
We further develop the string $1/c^2$ expansion of closed bosonic string
theory, where $c$ is the speed of light. The expansion will be performed up to
and including the next-to-next-to-leading order (NNLO). Finally, we expand the phase space action, which allows us to perform
the Dirac procedure and pass to the quantum theory.
We further develop the string $1/c^2$ expansion of closed bosonic string
theory, where $c$ is the speed of light. The expansion will be performed up to
and including the next-to-next-to-leading order (NNLO). We show that the
next-to-leading order (NLO) theory is equal to the Gomis--Ooguri string,
generalised to a curved target space, provided the target space geometry admits
a certain class of co-dimension-2 foliations. We compute the energy of the
string up to NNLO for a flat target space with a circle that must be wound by
the string, and we show that it agrees with the $1/c^2$ expansion of the
relativistic energy. We also compute the algebra of Noether charges for a flat
target space and show that this matches order-by-order with an appropriate
expansion of the Poincar\'e algebra, which at NLO gives the string Bargmann
algebra. Finally, we expand the phase space action, which allows us to perform
the Dirac procedure and pass to the quantum theory. It turns out that the
Poisson brackets change at each order, and we show that the normal ordering
constant of the relativistic theory, which does not depend on $c$, can be
reproduced by the NLO and NNLO theories.
Authors: Jelle Hartong, Emil Have.
We also calculate the differential probability with respect to the final meson momenta and the probability that one or two of the final mesons recoils back towards the source. In the ultrarelativistic limit of the initial meson, the total probability tends to a constant, which we calculate analytically in the $\phi^4$ model. At this order the meson sector conserves energy on its own, while the incoming meson applies a positive pressure to the kink.
In a (1+1)-dimensional scalar quantum field theory, we calculate the
leading-order probability of meson multiplication, which is the inelastic
scattering process: kink + meson $\rightarrow$ kink + 2 mesons. We also
calculate the differential probability with respect to the final meson momenta
and the probability that one or two of the final mesons recoils back towards
the source. In the ultrarelativistic limit of the initial meson, the total
probability tends to a constant, which we calculate analytically in the
$\phi^4$ model. At this order the meson sector conserves energy on its own,
while the incoming meson applies a positive pressure to the kink. This is in
contrast with the situation in classical field theory, where Romanczukiewicz
and collaborators have shown that, in the presence of a reflectionless kink,
only meson fusion is allowed, resulting in a negative radiation pressure on the
kink.
Authors: Jarah Evslin, Hui Liu, Baiyang Zhang.
By sampling
random $\ell$-step trajectories of an unknown system, we build an abstraction
based on the notion of $\ell$-completeness. Our method is then tested on several numerical
benchmarks.
A common technique to verify complex logic specifications for dynamical
systems is the construction of symbolic abstractions: simpler, finite-state
models whose behaviour mimics the one of the systems of interest. Typically,
abstractions are constructed exploiting an accurate knowledge of the underlying
model: in real-life applications, this may be a costly assumption. By sampling
random $\ell$-step trajectories of an unknown system, we build an abstraction
based on the notion of $\ell$-completeness. We newly define the notion of
probabilistic behavioural inclusion, and provide probably approximately correct
(PAC) guarantees that this abstraction includes all behaviours of the concrete
system, for finite and infinite time horizon, leveraging the scenario theory
for non convex problems. Our method is then tested on several numerical
benchmarks.
Authors: Rudi Coppola, Andrea Peruffo, Manuel Mazo Jr.
Our conclusions are in accordance with predictions based on strong field approximation.
We present experimental data on the strong field tunnel ionization of argon
in a counter-rotating two-color (CRTC) laser field. We find that the initial
momentum component along the tunneling direction changes sign comparing the
rising and the falling edge of the CRTC field. If the initial momentum at the
tunnel exit points in the direction of the ion at the instant of tunneling,
this manifests as an enhanced Coulomb interaction of the outgoing electron with
its parent ion. Our conclusions are in accordance with predictions based on
strong field approximation.
Authors: A. Geyer, D. Trabert, M. Hofmann, N. Anders, M. S. Schöffler, L. Ph. H. Schmidt, T. Jahnke, M. Kunitski, R. Dörner, S. Eckart.
The study of the polarization direction is crucial in the issue of restoring
the spatial structure of the magnetic field in the active galaxy parsec-scale
jets. Moreover, the local axis of the jet component may not coincide with its motion
direction, which affects the observed polarization direction.
The study of the polarization direction is crucial in the issue of restoring
the spatial structure of the magnetic field in the active galaxy parsec-scale
jets. But, due to relativistic effects, the magnetic field projected onto the
celestial sphere in the source reference frame cannot be assumed to be
orthogonal to the observed direction of the electric vector in the wave.
Moreover, the local axis of the jet component may not coincide with its motion
direction, which affects the observed polarization direction. In this article,
we analyze the transverse to jet distributions of the electric vector in the
wave, obtained as a result of modeling with different jet kinematic and
geometrical parameters for a helical magnetic field with a different twist
angle and for a toroidal magnetic field in the center, surrounded by a varying
thickness sheath, penetrated by a poloidal field. We obtained: 1) the shape of
the electric vector transverse distribution depends in a complex way on the
angles of the jet axis and the velocity vector with the line of sight; 2)
ambiguity in determining the twist direction of the helical magnetic field
under using only the distributions of the electric vector in the wave; 3) both
considered magnetic field topologies can reproduce both the ``spine-sheath''
polarization structure and individual bright details with the longitudinal to
the jet axis polarization direction.
Authors: Marina S. Butuzova.
The model describes not only ground-state scalar diquarks and pseudo-scalar mesons but also the excited pseudo-scalar diquarks and scalar mesons; each ground-state diquark (meson) has the corresponding excited diquark (hadron) with opposite parity as a chiral partner. Effects of chiral symmetry breaking and diquark condensates are incorporated by a mean-field treatment.
We investigate modifications of hadron masses at finite quark chemical
potential in two-flavor and two-color QCD, of which the data are available from
lattice simulations, within a linear sigma model based on approximate
Pauli-Gursey $SU(4)$ symmetry. The model describes not only ground-state scalar
diquarks and pseudo-scalar mesons but also the excited pseudo-scalar diquarks
and scalar mesons; each ground-state diquark (meson) has the corresponding
excited diquark (hadron) with opposite parity as a chiral partner. Effects of
chiral symmetry breaking and diquark condensates are incorporated by a
mean-field treatment. We show that various mixings among the hadrons, which are
triggered by the breakdown of baryon number conservation in the superfluid
phase, lead to a rich hadron mass spectrum. We discuss the influence of
$U(1)_A$ anomaly on the density dependence of the mass spectrum and also
manifestations of the chiral partner structures as density increases in the
superfluid phase. The predicted hadron masses are expected to provide future
lattice simulations with useful information on such symmetry properties in
dense two-color QCD.
Authors: Daiki Suenaga, Kotaro Murakami, Etsuko Itou, Kei Iida.
We find
training on these machine-translated prompts leads to better performance on
human-written prompts in the respective languages. We conjecture that the models are learning higher-level
capabilities that are both task- and language-agnostic. Our code, datasets and models are publicly
available at https://github.com/bigscience-workshop/xmtf.
Multitask prompted finetuning (MTF) has been shown to help large language
models generalize to new tasks in a zero-shot setting, but so far explorations
of MTF have focused on English data and models. We apply MTF to the pretrained
multilingual BLOOM and mT5 model families to produce finetuned variants called
BLOOMZ and mT0. We find finetuning large multilingual language models on
English tasks with English prompts allows for task generalization to
non-English languages that appear only in the pretraining corpus. Finetuning on
multilingual tasks with English prompts further improves performance on English
and non-English tasks leading to various state-of-the-art zero-shot results. We
also investigate finetuning on multilingual tasks with prompts that have been
machine-translated from English to match the language of each dataset. We find
training on these machine-translated prompts leads to better performance on
human-written prompts in the respective languages. Surprisingly, we find models
are capable of zero-shot generalization to tasks in languages they have never
intentionally seen. We conjecture that the models are learning higher-level
capabilities that are both task- and language-agnostic. In addition, we
introduce xP3, a composite of supervised datasets in 46 languages with English
and machine-translated prompts. Our code, datasets and models are publicly
available at https://github.com/bigscience-workshop/xmtf.
Authors: Niklas Muennighoff, Thomas Wang, Lintang Sutawika, Adam Roberts, Stella Biderman, Teven Le Scao, M Saiful Bari, Sheng Shen, Zheng-Xin Yong, Hailey Schoelkopf, Xiangru Tang, Dragomir Radev, Alham Fikri Aji, Khalid Almubarak, Samuel Albanie, Zaid Alyafeai, Albert Webson, Edward Raff, Colin Raffel.
We transform the plain ViT into a hierarchical one with minimal changes. The code and models will be released at https://github.com/ViTAE-Transformer/HPViT.
Self-supervised pre-training vision transformer (ViT) via masked image
modeling (MIM) has been proven very effective. However, customized algorithms
should be carefully designed for the hierarchical ViTs, e.g., GreenMIM, instead
of using the vanilla and simple MAE for the plain ViT. More importantly, since
these hierarchical ViTs cannot reuse the off-the-shelf pre-trained weights of
the plain ViTs, the requirement of pre-training them leads to a massive amount
of computational cost, thereby incurring both algorithmic and computational
complexity. In this paper, we address this problem by proposing a novel idea of
disentangling the hierarchical architecture design from the self-supervised
pre-training. We transform the plain ViT into a hierarchical one with minimal
changes. Technically, we change the stride of linear embedding layer from 16 to
4 and add convolution (or simple average) pooling layers between the
transformer blocks, thereby reducing the feature size from 1/4 to 1/32
sequentially. Despite its simplicity, it outperforms the plain ViT baseline in
classification, detection, and segmentation tasks on ImageNet, MS COCO,
Cityscapes, and ADE20K benchmarks, respectively. We hope this preliminary study
could draw more attention from the community on developing effective
(hierarchical) ViTs while avoiding the pre-training cost by leveraging the
off-the-shelf checkpoints. The code and models will be released at
https://github.com/ViTAE-Transformer/HPViT.
Authors: Yufei Xu, Jing Zhang, Qiming Zhang, Dacheng Tao.
However, it is not friendly for actual
clinical environments due to limited computing resources. Code is available at
https://github.com/JCruan519/MALUNet.
Recently, some pioneering works have preferred applying more complex modules
to improve segmentation performances. However, it is not friendly for actual
clinical environments due to limited computing resources. To address this
challenge, we propose a light-weight model to achieve competitive performances
for skin lesion segmentation at the lowest cost of parameters and computational
complexity so far. Briefly, we propose four modules: (1) DGA consists of
dilated convolution and gated attention mechanisms to extract global and local
feature information; (2) IEA, which is based on external attention to
characterize the overall datasets and enhance the connection between samples;
(3) CAB is composed of 1D convolution and fully connected layers to perform a
global and local fusion of multi-stage features to generate attention maps at
channel axis; (4) SAB, which operates on multi-stage features by a shared 2D
convolution to generate attention maps at spatial axis. We combine four modules
with our U-shape architecture and obtain a light-weight medical image
segmentation model dubbed as MALUNet. Compared with UNet, our model improves
the mIoU and DSC metrics by 2.39% and 1.49%, respectively, with a 44x and 166x
reduction in the number of parameters and computational complexity. In
addition, we conduct comparison experiments on two skin lesion segmentation
datasets (ISIC2017 and ISIC2018). Experimental results show that our model
achieves state-of-the-art in balancing the number of parameters, computational
complexity and segmentation performances. Code is available at
https://github.com/JCruan519/MALUNet.
Authors: Jiacheng Ruan, Suncheng Xiang, Mingye Xie, Ting Liu, Yuzhuo Fu.
There is limited understanding of the information captured by deep spatiotemporal models in their intermediate representations. (iv) Most models converge to their culminating biases in the first half of training. We then explore how these biases affect performance on dynamically biased datasets. For AVOS, we design a better combination of fusion and cross connection layers compared with previous architectures.
There is limited understanding of the information captured by deep
spatiotemporal models in their intermediate representations. For example, while
evidence suggests that action recognition algorithms are heavily influenced by
visual appearance in single frames, no quantitative methodology exists for
evaluating such static bias in the latent representation compared to bias
toward dynamics. We tackle this challenge by proposing an approach for
quantifying the static and dynamic biases of any spatiotemporal model, and
apply our approach to three tasks, action recognition, automatic video object
segmentation (AVOS) and video instance segmentation (VIS). Our key findings
are: (i) Most examined models are biased toward static information. (ii) Some
datasets that are assumed to be biased toward dynamics are actually biased
toward static information. (iii) Individual channels in an architecture can be
biased toward static, dynamic or a combination of the two. (iv) Most models
converge to their culminating biases in the first half of training. We then
explore how these biases affect performance on dynamically biased datasets. For
action recognition, we propose StaticDropout, a semantically guided dropout
that debiases a model from static information toward dynamics. For AVOS, we
design a better combination of fusion and cross connection layers compared with
previous architectures.
Authors: Matthew Kowal, Mennatullah Siam, Md Amirul Islam, Neil D. B. Bruce, Richard P. Wildes, Konstantinos G. Derpanis.