TypeScript and Flow are extensions of JavaScript that allow the programmer to specify in the code type annotations used to statically type-check the program. For instance, the following function definition is valid in both languages
\begin{alltt}\color{darkblue}
function foo(x\textcolor{darkred}{ : number | string}) \{
return (typeof(x) === "number")? x+1 : x.trim(); \refstepcounter{equation} \mbox{\color{black}\rm(\theequation)}\label{foo}
\}
\end{alltt}
Apart from the type annotation (in red) of the function parameter, the above is
standard JavaScript code defining a function that checks whether
its argument is an integer; if it is so, then it returns the argument's successor
(\code{x+1}), otherwise it calls the method \code{trim()} of the
argument. The annotation specifies that the parameter is either a
number or a string (the vertical bar denotes a union type). If this annotation is respected and the function
is applied to either an integer or a string, then the application
cannot fail because of a type error (\code{trim()} is a string method of the ECMAScript 5 standard that
trims whitespaces from the beginning and end of the string) and both the type-checker of TypeScript and the one of Flow
rightly accept this function. This is possible because both type-checkers
implement a specific type discipline called \emph{occurrence typing} or \emph{flow
typing}:\footnote{%
TypeScript calls it ``type guard recognition'' while Flow uses the terminology ``type
refinements''.}
as a matter of fact, standard type disciplines would reject this function.
The reason for that is that standard type disciplines would try to
type every part of the body of the function under the assumption that \code{x} has
type \code{number\,|\,string} and they would fail, since the successor is
not defined for strings and the method \code{trim()} is not defined
for numbers. This is so because standard disciplines do not take into account the type
test performed on \code{x}. Occurrence typing is the typing technique
that uses the information provided by the test to specialize---precisely, to \emph{refine}---the type
of the occurrences of \code{x} in the branches of the conditional: since the program tested that
\code{x} is of type \code{number}, then we can safely assume that
\code{x} is of type \code{number} in the ``then'' branch, and that it
is \emph{not} of type \code{number} (and thus deduce from the type annotation that it must be of type
\code{string}) in the ``else'' branch.
Occurrence typing was first defined and formally studied by
\citet{THF08} to statically type-check untyped Scheme programs, and later extended by \citet{THF10}
yielding the development of Typed Racket. From its inception,
occurrence typing was intimately tied to type systems with
set-theoretic types: unions, intersections, and negation of
types. Union was the first type connective to appear, since it was already used by
\citet{THF08} where its presence was needed to characterize the
different control flows of a type test, as our \code{foo} example
shows: one flow for integer arguments and another for
strings. Intersection types appear (in limited forms) combined with
occurrence typing both in TypeScript and in Flow and serve to give, among other,
more precise types to functions such as \code{foo}. For instance,
since \code{x\,+\,1} evaluates to an integer and \code{x.trim()} to a string, then our
function \code{foo} has type
\code{(number|string)$\to$(number|string)}.
% \footnote{\label{null}Actually,
% both Flow and TypeScript deduce as return type
% \code{number|string|void}, since they track when operations may yield \code{void} results. Considering this would be easy but also clutter our presentation, which is why we omit
% such details.}
But it is clear that a more precise type would be
one that states that \code{foo} returns a number when it is applied to
a number and returns a string when it is applied to a string, so that the type deduced for, say, \code{foo(42)} would be \code{number} rather than \code{number|string}. This is
exactly what the \emph{intersection type}
\begin{equation}\label{eq:inter}
\code{(number$\to$number) \& (string$\to$string)}
\end{equation}
states (intuitively, an expression has an intersection of types, noted \code{\&}, if and only if it has all the types of the intersection) and corresponds in Flow to declaring \code{foo} as follows:
\begin{alltt}\color{darkblue}
var foo : \textcolor{darkred}{(number => number) & (string => string)} = x => \{
return (typeof(x) === "number")? x+1 : x.trim(); \refstepcounter{equation} \mbox{\color{black}\rm(\theequation)}\label{foo2}
\}
\end{alltt}
For what concerns negation types, they are pervasive in the occurrence
typing approach, even though they are used only at meta-theoretic
level,\footnote{At the moment of writing there is a pending pull
request to add negation types to the syntax of TypeScript, but that is all.} in
particular to determine the type environment when the type case
fails. We already saw negation types at work when we informally typed the ``else''
branch in \code{foo}, for which we assumed that $x$ did \emph{not} have
type \code{number}---i.e., it had the (negation) type \code{$\neg$number}---and
deduced from it that $x$ then had type \code{string}---i.e.,
\code{(number|string)\&$\neg$number} which is equivalent to the set-theoretic difference
\code{(number|string)\textbackslash\,number} and, thus, to \code{string}.
The approaches cited above essentially focus on refining the type
of variables that occur in an expression whose type is being tested. They
do it when the variable occurs at top-level in the test (i.e., the variable is the
expression being tested) or under some specific positions such as in
nested pairs or at the end of a path of selectors. \beppe{Not precise,
please check which are the cases that are handled in the cited
papers} In this work we aim at removing this limitation on the
contexts and develop a general theory to refine the type of variables
that occur in tested expressions under generic contexts, such as variables occurring in the
left or the right expressions of an application. In other words, we aim at establishing a formal framework to
extract as much static information as possible from a type test. We leverage our
analysis on the presence of full-fledged set-theoretic types
connectives provided by the theory of semantic subtyping. Our analysis
will also yield two important byproducts. First, to refine the type of the
variables we have to refine the type of the expressions they occur in and
we can use this information to improve our analysis. Therefore our
occurrence typing approach will refine not only the types of variables
but also the types of generic expressions (bypassing usual type
inference). Second, and most importantly, the result of our analysis can be used to infer
intersection types for functions, even in the absence of precise type
annotations such as the one in the definition of \code{foo} in~\eqref{foo2}: to put it simply, we are able to
infer the type~\eqref{eq:inter} for the unannotated pure JavaScript
code of \code{foo} (i.e., no type annotation at all), while in TypeScript and Flow (and any other
formalism we are aware of)
this requires an explicit and full type annotation as the one given in~\eqref{foo2}.
Finally, the natural target for occurrence typing are languages with
dynamic type tests, in particular, dynamic languages. To type such
languages occurrence typing is often combined not only, as discussed
above, with set-theoretic types, but also with extensible record types
(to type objects) and gradual type system (to combine static and
dynamic typing) two features that we study in
Section~\ref{sec:extensions} as two extensions of our core formalism. Of
particular interest is the latter. \citet{Gre19} singles out
occurrence typing and gradual typing as \emph{the} two ``lineages'' that
partition the research on combining static and dynamic typing: he
identifies the former as the ``pragmatic,
implementation-oriented dynamic-first'' lineage and the latter as the
``formal, type-theoretic, static-first'' lineage. Here we demonstrate that
these two ``lineages'' are not orthogonal or mutually independent, and
we combine occurrence and gradual typing showing, in particular, how
the former can be used to optimize the compilation of the latter.
\subsection{Motivating examples}
We focus our study on conditionals that test types and consider the following syntax:
\(
\ifty{e}{t}{e}{e}
\) (e.g., in this syntax the body of \code{foo} in \eqref{foo} and \eqref{foo2} is rendered as
\(
\ifty{x}{\Int}{x+1}{(\textsf{trim } x)}
\)).
In particular, in this introduction we concentrate on applications, since they constitute the most difficult case and many other cases can be reduced to them. A typical example is the expression
\begin{equation}\label{typical}
\ifty{x_1x_2}{t}{e_1}{e_2}
\end{equation}
where $x_i$'s denote variables, $t$ is some type, and $e_i$'s are generic expressions.
Depending on the actual $t$ and on the static types of $x_1$ and $x_2$, we
can make type assumptions for $x_1$, for $x_2$, \emph{and} for the application $x_1x_2$
when typing $e_1$ that are different from those we can make when typing
$e_2$. For instance, suppose $x_1$ is bound to the function \code{foo} defined in \eqref{foo2}. Thus $x_1$ has type $(\Int\to\Int)\wedge(\String\to\String)$ (we used the syntax of the types of Section~\ref{sec:language} where unions and intersections are denoted by $\vee$ and $\wedge$ and have priority over $\to$ and $\times$, but not over $\neg$).
Then, it is not hard to see that if $x_2:\Int{\vee}\String$, then the expression\footnote{This and most of the following expressions are just given for the sake of example. Determining the type \emph{in each branch} of expressions other than variables is interesting for constructors but less so for destructors such as applications, projections, and selections: any reasonable programmer would not repeat the same application twice, (s)he would store its result in a variable. This becomes meaningful with constructor such as pairs, as we do for instance in the expression in~\eqref{pair}.}
%
\begin{equation}\label{mezzo}
\texttt{let }x_1 \texttt{\,=\,}\code{foo}\texttt{ in } \ifty{x_1x_2}{\Int}{((x_1x_2)+x_2)}{\texttt{42}}
\end{equation}
%
is well typed with type $\Int$: when typing the branch ``then'' we
know that the test $x_1x_2\in \Int$
succeeded and that, therefore, not
only $x_1x_2$ is of type \Int, but also that $x_2$ is of type $\Int$: the other possibility,
$x_2:\String$, would have made the test fail.
For~\eqref{mezzo} we reasoned only on the type of the variables in the ``then'' branch but we can do the same
on the ``else'' branch as shown by the following expression, where \code{@} denotes string concatenation
\begin{equation}\label{two}
\ifty{x_1x_2}{\Int}{((x_1x_2)+x_2)}{((x_1x_2)\code{\,@\,}x_2)}
\end{equation}
If the static type of
$x_1$ is $(\Int\to\Int)\wedge(\String\to\String)$ then $x_1x_2$ is well
typed only if the static type of $x_2$ is (a subtype of)
$\Int\vee\String$ and from that it is not hard to deduce that~\eqref{two}
has type $\Int\vee\String$. Let us see this in detail. The expression in~\eqref{two} is
typed in the following type environment:
$x_1:(\Int\to\Int)\wedge(\String\to\String), x_2:\Int\vee\String$. All we
can deduce, then, is that the application $x_1x_2$ has type
$\Int\vee\String$, which is not enough to type either the ``then'' branch
or the ``else'' branch. In order to type the ``then'' branch
$(x_1x_2)+x_2$ we must be able to deduce that both $x_1x_2$ and $x_2$
are of type \Int. Since we are in the ``then'' branch, then we know that
the type test succeeded and that, therefore, $x_1x_2$ has type \Int. Thus
we can assume in typing this branch that $x_1x_2$ has both its static
type and type \Int{} and, thus, their intersection:
$(\Int\vee\String)\wedge\Int$, that is \Int. For what concerns $x_2$ we
use the static type of $x_1$, that is
$(\Int\to\Int)\wedge(\String\to\String)$, and notice that this function
returns an \Int\ only if its argument is of type \Int. Reasoning as
above we thus deduce that in the ``then'' branch the type of $x_2$ is
the intersection of its static type with \Int:
$(\Int\vee\String)\wedge\Int$ that is \Int. To type the ``else'' branch
we reason exactly in the same way, with the only difference that, since
the type test has failed, then we know that the type of the tested expression is
\emph{not} \Int. That is, the expression $x_1x_2$ can produce any possible value
barring an \Int. If we denote by \Any\ the type of all values (i.e., the
type \code{any} of TypeScript and Flow) and by
$\setminus$ the set difference, then this means that in the else branch we
know that $x_1x_2$ has type $\Any{\setminus}\Int$---written
$\neg\Int$---, that is, it can return values of any type barred \Int. Reasoning as for the ``then'' branch we then assume that
$x_1x_2$ has type $(\Int\vee\String)\wedge\neg\Int$ (i.e., $(\Int\vee\String)\setminus\Int$, that is, \String), that
$x_2$ must be of type \String\ for the application to have type
$\neg\Int$ and therefore we assume that $x_2$ has type
$(\Int\vee\String)\wedge\String$ (i.e., again \String).
We have seen that we can specialize in both branches the type of the
whole expression $x_1x_2$, the type of the argument $x_2$,
but what about the type of the function $x_1$? Well, this depends on the type of
$x_1$ itself. In particular, if instead of an intersection type $x_1$
is typed by a union type (e.g., when the function bound to $x_1$ is
the result of a branching expression), then the test may give us information about
the type of the function in the various branches. So for instance if in the expression
in~\eqref{typical} $x_1$ is of type, say, $(s_1\to t)\vee(s_2\to\neg t)$, then we can
assume for the expression~\eqref{typical} that $x_1$ has type $(s_1\to t)$ in the branch ``then'' and
$(s_2\to \neg t)$ in the branch ``else''.
As a more concrete example, if
$x_1:(\Int{\vee}\String\to\Int)\vee(\Bool{\vee}\String\to\Bool)$ and
$x_1x_2$ is well-typed, then we can deduce for
\begin{equation}\label{exptre}
\ifty{x_1x_2}{\Int}{(x_1(x_1x_2)+42)}{\texttt{not}(x_1(x_1x_2))}
\end{equation}
the type $\Int\vee\Bool$: in the ``then'' branch $x_1$ has type
$\Int{\vee}\String\to\Int$ and $x_1x_2$ is of type $\Int$; in the
``else'' branch $x_1$ has type $\Bool{\vee}\String\to\Bool$ and
$x_1x_2$ is of type $\Bool$.
Let us recap. If $e$ is an expression of type $t_0$ and we are trying to type
%
\(\ifty{e}{t}{e_1}{e_2}\),
%
then we can assume that $e$ has type $t_0\wedge t$ when typing $e_1$
and type $t_0\setminus t$ when typing $e_2$. If furthermore $e$ is of
the form $e'e''$, then we may also be able to specialize the types for $e'$ (in
particular if its static type is a union of arrows) and for $e''$ (in
particular if the static type of $e'$ is an intersection of
arrows). Additionally, we can repeat the reasoning for all subterms of $e'$
and $e''$ as long as they are applications, and deduce distinct types for all subexpressions of $e$ that
form applications. How to do it precisely---not only for applications, but also for other terms such as pairs, projections, records etc---is explained in the rest of
the paper but the key ideas are pretty simple and are presented next.
\subsection{Key ideas}\label{sec:ideas}
First of all, in a strict language we can consider a type as denoting the set of values of that type and subtyping as set-containment of the denoted values.
Imagine we are testing whether the result of an application $e_1e_2$
is of type $t$ or not, and suppose we know that the static types of
$e_1$ and $e_2$ are $t_1$ and $t_2$ respectively. If the application $e_1e_2$ is
well typed, then there is a lot of useful information that we can deduce from it:
first, that $t_1$ is a functional type (i.e., it denotes a set of
well-typed $\lambda$-abstractions, the values of functional type) whose domain, denoted by $\dom{t_1}$, is a type denoting the set of all values that are accepted by any function in
$t_1$; second that $t_2$ must be a subtype of the domain of $t_1$;
third, we also know the type of the application, that is the type
that denotes all the values that may result from the application of a
function in $t_1$ to an argument in $t_2$, type that we denote by
$t_1\circ t_2$. For instance, if $t_1=\Int\to\Bool$ and $t_2=\Int$,
then $\dom{t_1}=\Int$ and $t_1\circ t_2 = \Bool$. Notice that, introducing operations such as $\dom{}$ and $\circ$ is redundant when working with simple types, but becomes necessary in the presence of set-theoretic types. If for instance $t_1$ is the type of \eqref{foo2}, that is, \(t_1=(\Int{\to}\Int)\) \(\wedge\) \((\String{\to}\String)\),
%---which denotes functions that return an integer if applied to an integer and a string if applied to a string---
then
\( \dom{t} = \Int \vee \String \),
that is the union of all the possible
input types, while the precise return type of such a
function depends on the type of the argument the function is applied to:
either an integer, or a string, or both (i.e., the union type
\(\Int\vee\String\)). So we have \( t_1 \circ \Int = \Int \),
\( t_1 \circ \String = \String \), and
\( t_1 \circ (\Int\vee\String) = \Int \vee \String \) (see Section~\ref{sec:typeops} for the formal definition of $\circ$).
What we want to do
is to refine the types of $e_1$ and $e_2$ (i.e., $t_1$ and $t_2$) for the cases where the test
that $e_1e_2$ has type $t$ succeeds or fails. Let us start with refining the type $t_2$ of $e_2$ for the case in
which the test succeeds. Intuitively, we want to remove from $t_2$ all
the values for which the application will surely return a result not
in $t$, thus making the test fail. Consider $t_1$ and let $s$ be the
largest subtype of $\dom{t_1}$ such that%
\svvspace{-1.29mm}
\begin{equation}\label{eq1}
t_1\circ s\leq \neg t
\end{equation}
In other terms, $s$ contains all the legal arguments that make any function
in $t_1$ return a result not in $t$. Then we can safely remove from
$t_2$ all the values in $s$ or, equivalently, keep in $t_2$ all the
values of $\dom{t_1}$ that are not in $s$. Let us implement the second
viewpoint: the set of all elements of $\dom{t_1}$ for which an
application \emph{does not} surely give a result in $\neg t$ is
denoted $\worra{t_1}t$ (read, ``$t_1$ worra $t$'') and defined as $\min\{u\leq \dom{t_1}\alt
t_1\circ(\dom{t_1}\setminus u)\leq\neg t\}$: it is easy to see that
according to this definition $\dom{t_1}\setminus(\worra{t_1} t)$ is
the largest subset of $\dom{t_1}$ satisfying \eqref{eq1}. Then we can
refine the type of $e_2$ for when the test is successful by using the
type $t_2\wedge(\worra{t_1} t)$: we intersect all the possible results
of $e_2$, that is $t_2$, with the elements of the domain that
\emph{may} yield a result in $t$, that is $\worra{t_1} t$. When the test fails,
the type of $e_2$ can be refined in a similar way just by replacing $t$ by $\neg t$:
we get the refined type $t_2\land(\worra{t_1}{\neg t})$. To sum up, to refine the type of an
argument in the test of an application, all we need is to define
$\worra{t_1} t$, the set of arguments that when applied to a function
of type $t_1$ \emph{may} return a result in $t$; then we can refine the
type of $e_2$ as $t_2^+ \eqdeftiny t_2\wedge(\worra{t_1} t)$ in the ``then'' branch (we call it the \emph{positive} branch)
and as $t_2^- \eqdeftiny t_2\setminus(\worra{t_1} t)$ in the ``else'' branch (we call it the \emph{negative} branch).
As a side remark note\marginpar{\tiny\beppe{Remove if space is needed}}
that the set $\worra{t_1} t$ is different from the set of elements that return a
result in $t$ (though it is a supertype of it). To see that, consider
for $t$ the type \String{} and for $t_1$ the type $(\Bool \to\Bool)\wedge(\Int\to(\String\vee\Int))$,
that is, the type of functions that when applied to a Boolean return a
Boolean and when applied to an integer return either an integer or a
string; then we have that $\dom{t_1}=\Int\vee\Bool$ and $\worra{t_1}\String=\Int$,
but there is no (non-empty) type that ensures that an application of a
function in $t_1$ will surely yield a $\String$ result.
Once we have determined $t_2^+$, it is then not very difficult to refine the
type $t_1$ for the positive branch, too. If the test succeeded, then we know two facts: first,
that the function was applied to a value in $t_2^+$ and, second, that
the application did not diverge and returned a result in
$t$. Therefore, we can exclude from $t_1$ all the functions that, when applied to an argument
in $t_2^+$, yield a result not in $t$.
It can be obtained simply by removing from $t_1$ the functions in
$t_2^+\to \neg t$, that is, we refine the type of $e_1$ in the ``then'' branch as
$t_1^+=t_1\setminus (t_2^+\to\neg t)$.
Note that this also removes functions diverging on $t_2^+$ arguments.
In particular, the interpretation of a type $t\to s$ is the set of all functions that when applied to an argument of type $t$
either diverge or return a value in $s$. As such the interpretation of $t\to s$ contains
all the functions that diverge (at least) on $t$. Therefore removing $t\to s$ from a type $u$ removes from $u$
not only all the functions that when applied to a $t$ argument return a result in $s$, but also all the functions that diverge on $t$.
Ergo $t_1\setminus (t_2^+\to
\neg t)$ removes, among others, all functions in $t_1$ that diverge on $t_2^+$.
Let us see all this on our example \eqref{exptre}, in particular, by showing how this technique deduces that the type of $x_1$ in the positive branch is (a subtype of) $\Int{\vee}\String\to\Int$.
Take the static type of $x_1$, that is $(\Int{\vee}\String\to\Int)\vee(\Bool{\vee}\String\to\Bool)$ and intersect it with
$(t_2^+\to \neg t)$, that is, $\String\to\neg\Int$. Since intersection distributes over unions we
obtain
\svvspace{-1mm}
%
\[((\Int{\vee}\String{\to}\Int)\wedge\neg(\String{\to}\neg\Int))\vee((\Bool{\vee}\String{\to}\Bool)\wedge\neg(\String{\to}\neg\Int))\svvspace{-1mm}\]
%
and since
$(\Bool{\vee}\String{\to}\Bool)\wedge\neg(\String{\to}\neg\Int)$ is empty
(because $\String\to\neg\Int$ contains $\Bool{\vee}\String\to\Bool$),
then what we obtain is the left summand, a strict subtype of $(\Int{\vee}\String)\to\Int$, namely the functions of type $\Int{\vee}\String{\to}\Int$ minus those
that diverge on all \String{} arguments.
This is essentially what we formalize in Section~\ref{sec:language}, in the type system by the rule \Rule{PAppL} and in the typing algorithm with the case \eqref{due} of the definition of the function \constrf.
\subsection{Technical challenges}\label{sec:challenges}
In the previous section we outlined the main ideas of our approach to occurrence typing. However, devil is in the details. So the formalization we give in Section~\ref{sec:language} is not so smooth as we just outlined: we must introduce several auxiliary definitions to handle some corner cases. This section presents by tiny examples the main technical difficulties we had to overcome and the definitions we introduced to handle them. As such it provides a kind of road-map for the technicalities of Section~\ref{sec:language}.
\paragraph{Typing occurrences} As it should be clear by now, not only variables
but also generic expressions are given different types in the ``then'' and
``else'' branches of type tests. For instance, in \eqref{two} the expression
$x_1x_2$ has type \Int{} in the positive branch and type \Bool{} in the negative
one. In this specific case it is possible to deduce these typings from the
refined types of the variables (in particular, thanks to the fact that $x_2$ has
type \Int{} the positive branch and \Bool{} in the negative one), but this is
not possible in general. For instance, consider $x_1:\Int\to(\Int\vee\Bool)$,
$x_2:\Int$, and the expression
\svvspace{-1mm}
\begin{equation}\label{twobis}
\ifty{x_1x_2}{\Int}{...x_1x_2...}{...x_1x_2...}\svvspace{-1mm}
\end{equation}
It is not possible to specialize the type of the variables in the
branches. Nevertheless, we want to be able to deduce that $x_1x_2$ has
type \Int{} in the positive branch and type \Bool{} in the negative
one. In order to do so in Section~\ref{sec:language} we will use
special type environments that map not only variables but also generic
expressions to types. So to type, say, the positive branch of
\eqref{twobis} we extend the current type environment with the
hypothesis that the expression $x_1x_2$ has type \Int{}.
When we test the type of an expression we try to deduce the type of
some subexpressions occurring in it. Therefore we must cope with
subexpressions occurring multiple times. A simple example is given by
using product types and pairs as in
$\ifty{(x,x)}{\pair{t_1}{t_2}}{e_1}{e_2}$. It is easy to see that the
positive branch $e_1$ is selected only if $x$ has type $t_1$
\emph{and} type $t_2$ and deduce from that that $x$ must be typed in
$e_1$ by their intersection, $t_1\wedge t_2$. To deal with multiple
occurrences of a same subexpression the type inference system of
Section~\ref{sec:language} will use the classic rule for introducing intersections \Rule{Inter}, while the algorithmic counterpart will use the operator $\Refine{}{}{}$ that
intersects the static type of an expression with all the types deduced
for the multiple occurrences of it.
\paragraph{Type preservation} We want our type system to be sound in the sense of~\citet{Wright1994}, that is, that it
satisfies progress and type preservation. The latter property is
challenging because, as explained just above, our type
assumptions are not only about variables but also about
expressions. Two corner cases are particularly difficult. The first is
shown by the following example\svvspace{-.9mm}
\begin{equation}\label{bistwo}
\ifty{e(42)}{\Bool}{e}{...} \svvspace{-.9mm}
\end{equation}
If $e$ is an expression of type $\Int\to t$, then, as discussed before,
the positive branch will have type $(\Int\to t)\setminus(\Int\to\neg
\Bool)$. If furthermore the negative branch is of the same type (or of a subtype), then this
will also be the type of the whole expression in \eqref{bistwo}. Now imagine
that the application $e(42)$ reduces to a Boolean value, then the whole
expression in \eqref{bistwo} reduces to $e$; but this has type
$\Int\to t$ which, in general, is \emph{not} a subtype of $(\Int\to
t)\setminus(\Int\to\neg\Bool)$, and therefore type is not preserved by the reduction. To cope with this problem, the proof
of type preservation (see \Appendix\ref{app:subject-reduction}) resorts to \emph{type schemes}, a
technique introduced by~\citet{Frisch2008} to type expressions by
sets of types, so that the expression in \eqref{bistwo} will have both the types at issue.
The second corner case is a modification of the example above
where the positive branch is $e(42)$, e.g.,
$\ifty{e(42)}{\Bool}{e(42)}{\textsf{true}}$. In this case the type deduced for the
whole expression is \Bool, while after reduction we would obtain
the expression $e(42)$ which is not of type \Bool{} but of type $t$ (even though it will
eventually reduce to a \Bool). This problem will be handled in the
proof of type preservation by considering parallel reductions (e.g, if
$e(42)$ reduces in a step to, say, $\textsf{false}$, then
$\ifty{e(42)}{\Bool}{e(42)}{\textsf{true}}$ reduces in one step to
$\ifty{\textsf{false}}{\Bool}{\textsf{false}}{\textsf{true}}$): see \Appendix\ref{app:parallel}.
\paragraph{Interdependence of checks} The last class of technical problems arise
from the mutual dependence of different type checks. In particular, there are two cases
that pose a problem. The first can be shown by two functions $f$ and $g$ both of type $(\Int\to\Int)\wedge(\Any\to\Bool)$, $x$ of type $\Any$ and the test:
\begin{equation}\label{nest1}
\ifty{(f\,x,g\,x)}{\pair\Int\Bool}{\,...\,}{\,...}
\end{equation}
If we independently check $f\,x$ against $\Int$ and $g\,x$ against $\Bool$
we deduce $\Int$ for the first occurrence of $x$ and $\Any$ for the
second. Thus we would type the positive branch of \eqref{nest1} under the hypothesis
that $x$ is of type $\Int$. But if we use the hypothesis generated by
the test of $f\,x$, that is, that $x$ is of type \Int, to check $g\,x$ against \Bool,
then the type deduced for $x$ is $\Empty$---i.e., the branch is never selected. In other words, we want to produce type
environments for occurrence typing by taking into account all
the available hypotheses, even when these hypotheses are formulated later
in the flow of control. This will be done in the type systems of
Section~\ref{sec:language} by the rule \Rule{Path} and will require at
algorithmic level to look for a fix-point solution of a function, or
an approximation thereof.
Finally, a nested check may help refining
the type assumptions on some outer expressions. For instance, when typing
the positive branch $e$ of\svvspace{-.9mm}
\begin{equation}\label{pair}
\ifty{(x,y)}{(\pair{(\Int\vee\Bool)}\Int)}{e}{...}\svvspace{-.9mm}
\end{equation}
we can assume that the expression $(x,y)$ is of type
$\pair{(\Int\vee\Bool)}\Int$ and put it in the type environment. But
if in $e$ there is a test like
$\ifty{x}{\Int}{{\color{darkred}(x,y)}}{(...)}$ then we do not want use
the assumption in the type environment to type the expression $(x,y)$
occurring in the inner test (in red). Instead we want to give to that occurrence of the expression
$(x,y)$ the type $\pair{\Int}\Int$. This will be done by temporary
removing the type assumption about $(x,y)$ from the type environment and
by retyping the expression without that assumption (see rule
\Rule{Env\Aa} in Section~\ref{sec:algorules}).
\iflongversion
\subsubsection*{Outline}
\else
\subsubsection*{Outline and Contributions}
\fi
In Section~\ref{sec:language} we formalize the
ideas we just presented: we define the types and
expressions of our system, their dynamic semantics and a type system that
implements occurrence typing together with the algorithms that decide
whether an expression is well typed or not. Section~\ref{sec:extensions} extends our formalism to record
types and presents two applications of our analysis: the inference of
arrow types for functions and a static analysis to reduce the number
of casts inserted by a compiler of a gradually-typed
language. Practical aspects are discussed in
Section~\ref{sec:practical} where we give several paradigmatic examples of code typed by our prototype implementation, that can be interactively tested at
\ifsubmission
the (anonymized) site
\fi
\url{https://occtyping.github.io/}. Section~\ref{sec:related} presents
related work.
\iflongversion
A discussion of future work concludes this
presentation.
\fi
%
To ease the presentation all the proofs are
omitted from the main text and can be found in the appendix
\ifsubmission
joint to the submission as supplemental material.
\else
available online.
\fi
\iflongversion
\subsubsection*{Contributions}
\fi
\noindent The main contributions of our work can be summarized as follows:
\begin{itemize}[left=0pt .. \parindent,nosep]
\item We provide a theoretical framework to refine the type of
expressions occurring in type tests, thus removing the limitations
of current occurrence typing approaches which require both the tests and the refinement
to take place on variables.
\item We define a type-theoretic approach alternative to the
current flow-based approaches. As such it provides different
results and it can be thus profitably combined with flow-based techniques.
\item We use our analysis for defining a formal framework that
reconstructs intersection types for unannotated or
partially-annotated functions, something that, in our ken, no
other current system can do.
\item We prove the soundness of our system. We define algorithms to
infer the types that we prove to be sound and show different completeness results
which in practice yield the completeness of any reasonable implementation.
\item We show how to extend our approach to records with field
addition, update, and deletion operations.
\item We show how occurrence typing can be extended to and combined with
gradual typing and apply our results to optimize the compilation of the
latter.
% \item We show how it is possible to extend our framework to
% exploit the richer information provided by polymorphic types.
\end{itemize}
We end this introduction by stressing the practical
implications of our work: a perfunctory inspection may give the
wrong impression that the only interest of the heavy formalization
that follows is to have generic expressions, rather than just variables, in type
cases: this would be a bad trade-off. The important point is,
instead, that our formalization is what makes analyses such as those
presented in Section~\ref{sec:extensions} possible (e.g., the
reconstruction of the type~\eqref{eq:inter} for the unannotated pure
JavaScript code of \code{foo}), which is where the actual added
practical value and potential of our work resides.