## Binary to Decimal Conversion in Limited Precision
Part of
the Arithmetic Tutorial Collection
Copyright © 1999, Douglas. W. Jones, with major revisions made in 2002 and 2010. This work may be transmitted or stored in electronic form on any computer attached to the Internet or World Wide Web so long as this notice is included in the copy. Individuals may make single copies for their own use. All other rights are reserved. |

- Introduction
- The Basic Idea
- Reduction to Code
- Dealing with Signed Numbers
- Living Within 8 Bits
- Doing Without Hardware Multiply and Divide
- Alternative Radices
- 64-bit Conversion on a 32-bit Machine

Rated as a top 50 site by Programmer's Heaven as of Feb 2002. |

Programmers on binary computers have always faced computational problems when dealing with textual input-output because of our convention that printed numeric data should be represented in base 10. These problems have never been insurmountable, but they are particularly annoying when writing programs for machines with limited capacity in their arithmetic units. Suppose, for example, you have an unsigned binary number that you wish to print out in decimal form. The following C code will do this:

void putdec( unsigned int n ) { unsigned int rem = n % 10; unsigned int quot = n / 10; if (quot > 0) putdec( quot ); putchar( rem + '0' ); }

The problem with this C code is that it assumes that the computer's arithmetic unit is sufficiently large to handle the entire number n, and it assumes that the arithmetic unit can conveniently do integer division yielding both a quotient and a remainder. These assumptions are not always reasonable, both because many commercially successful computers do not include division hardware and because, even when such hardware is available, it is of little direct use when dividing numbers larger than the ALU can handle.

The focus of this tutorial is on writing fast code to convert unsigned 16 bit binary numbers to decimal on a machine with an 8-bit ALU and no divide instruction. This situation is common on small microcontrollers, and the techniques generalize in useful ways, for example, to the problem of doing 64 bit binary to decimal conversion using a 32 bit ALU. A final section of this tutorial examines how these methods can be extended to larger numbers, for example, converting a 64-bit binary number to decimal on a machine with a 32-bit ALU.

On machines with no divide instruction where speed is no issue, it may be better to simply use repeated subtraction instead of the fast but algebraically complex code given in the body of this tutorial. Consider the following 16-bit conversion code as an example.

void putdec( int n ) { char a, b, c, d; if (n < 0) { putchar( '-' ); n = -n; } for ( a = '0' - 1; n >= 0; n -= 10000 ) ++a; for ( b = '9' + 1; n < 0; n += 1000 ) --b; for ( c = '0' - 1; n >= 0; n -= 100 ) ++c; for ( d = '9' + 1; n < 0; n += 10 ) --d; putchar( a ); putchar( b ); putchar( c ); putchar( d ); putchar( n + '0' ); }

Variants on the clever algorithm given above have been very popular on 8-bit microprocessors and microcontrollers, despite the fact that conversions can require over 30 iterations of the for loops in this code. As given here, this code does not work for the extreme negative value -32768; fixing this bug without introducing a special case is an interesting exercise.

To do the conversion faster, what we'll do is look at our binary number
as a base 16 number and then do the conversion in terms of the base 16
digits.
Consider a 16 bit number *n*. This can be broken into 4 fields of
4 bits each, the base 16 digits; call them *n*_{3},
*n*_{2}, *n*_{1} and *n*_{0}.

n |_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _| |_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_| | n | n | n | n | 3 2 1 0

The value of the number can be expressed in terms of these 4 fields as follows:

n= 4096n_{3}+ 256n_{2}+ 16n_{1}+ 1n_{0}

In the same way, if the value of *n*, expressed in decimal, is
*d*_{4}*d*_{3}*d*_{2}*d*_{1}*d*_{0}, where each of
*d*_{4}, *d*_{3}, *d*_{2},
*d*_{1} and *d*_{0} are decimal digits, the
value of n can be expressed as:

n= 10000d_{4}+ 1000d_{3}+ 100d_{2}+ 10d_{1}+ 1d_{0}

Our problem then, is to compute the *d*_{i} from the
*n*_{i} without dealing directly with the larger
number *n*.

To do this, we first note that the factors used in combining the values of
*n*_{i} can themselves be expressed as sums of multiples of
powers of 10. Therefore, we can rewrite the original expression as:

n=

n_{3}( 4×1000 + 0×100 + 9×10 + 6×1 ) +

n_{2}( 2×100 + 5×10 + 6×1 ) +

n_{1}( 1×10 + 6×1 ) +

n_{0}( 1×1 )

If distribute the *n*_{i} over the expressions for each factor,
then factor out the multiples of 100, we get the following:

n=

1000 (4n_{3}) +

100 (0n_{3}+ 2n_{2}) +

10 (9n_{3}+ 5n_{2}+ 1n_{1}) +

1 (6n_{3}+ 6n_{2}+ 6n_{1}+ 1n_{0})

We can use this to arrive at first approximations *a*_{i}
for *d*_{3} through *d*_{0}:

a_{3}= 4n_{3}a_{2}= 0n_{3}+ 2n_{2}a_{1}= 9n_{3}+ 5n_{2}+ 1n_{1}a_{0}= 6n_{3}+ 6n_{2}+ 6n_{1}+ 1n_{0}

The values of *a*_{i} are not proper decimal digits because
they are not properly bounded in the range
0 *<* *a*_{i} *<* 9.
Instead, given that each
of *n*_{i} is bounded in the range
0 *<* *n*_{i} *<* 15,
the *a*_{i}
are bounded as follows:

0

<a_{3}<60

0<a_{2}<30

0<a_{1}<225

0<a_{0}<285

This is actually quite promising because *a*_{3} through
*a*_{1} are less than 256 and may therefore be computed using
an 8 bit ALU, and even *a*_{0} may be computed in 8 bits if we
can use the carry-out bit of the ALU to hold the high bit. Furthermore,
if our interest is 2's complement arithmetic where the high bit is the
sign bit, the first step in the output process is to print a minus sign
and then negate, so the constraint on *n*_{3} becomes
0 *<* *n*_{3} *<* 8
and we can conclude naively that

0

<a_{0}<243

Note that we said
*n*_{3} *<* 8
and not
*n*_{3} < 8;
this is because of the one negative number in the 2's complement system
that cannot be properly negated, -32768. If we exclude this number from
the range of legal values, the upper bound
is reduced to 237; in fact, this is the overall upper bound, because
in the one case where
*n*_{3} is 8, all of the other
*n*_{i} are zero, so we can assert globally that

0

<a_{0}<237

Even with our naive bound, however, it is easy to see that we can
safely use 8 bit arithmetic to accumulate all of the
*a*_{i}.

Given that we have computed the *a*_{i}, we can push them
into the correct ranges by a series of 8-bit divisions and one 9-bit division,
as follows:

c_{1}=a_{0}/ 10d_{0}=a_{0}mod 10

c_{2}= (a_{1}+c_{1}) / 10d_{1}= (a_{1}+c_{1}) mod 10

c_{3}= (a_{2}+c_{2}) / 10d_{2}= (a_{2}+c_{2}) mod 10

c_{4}= (a_{3}+c_{3}) / 10d_{3}= (a_{3}+c_{3}) mod 10

d_{4}=c_{4}

In the above, the *c*_{i} terms represent carries propagated
from one digit position to the next. The upper bounds on each of the
*c*_{i} can be computed from the bounds given above for the
*a*_{i}:

c_{1}<28 [ = 285 / 10 ]c_{2}<25 [ = (28 + 225) / 10 = 253 / 10 ]c_{3}<5 [ = (25 + 30) / 10 = 55 / 10 ]c_{4}<6 [ = (5 + 60) / 10 = 65 / 10 ]

In computing the bound on *c*_{2}, we came within 2 of
to 255, the maximum that can be represented in 8 bits, so an 8 bit
ALU is just barely sufficient for this computation. Again, if we
negate any negative numbers prior to output, so that the maximum
value of *n*_{3} is 8, the bound on *c*_{1}
becomes 23 instead of 28.

A first hack at reducing the above idea to code might include named temporaries
for all of the terms used above, but we can avoid this if we reorganize
things slightly and note that, after computing each digit of the number, we
no-longer need one of the *n*_{i}. This leads to the
following C code:

void putdec( uint16_t n ) { uint8_t d4, d3, d2, d1, q; uint16_t d0; d0 = n & 0xF; d1 = (n>>4) & 0xF; d2 = (n>>8) & 0xF; d3 = (n>>12) & 0xF; d0 = 6*(d3 + d2 + d1) + d0; q = d0 / 10; d0 = d0 % 10; d1 = q + 9*d3 + 5*d2 + d1; q = d1 / 10; d1 = d1 % 10; d2 = q + 2*d2; q = d2 / 10; d2 = d2 % 10; d3 = q + 4*d3; q = d3 / 10; d3 = d3 % 10; d4 = q; putchar( d4 + '0' ); putchar( d3 + '0' ); putchar( d2 + '0' ); putchar( d1 + '0' ); putchar( d0 + '0' ); }

In the above,
we use the standard C types `uint8_t` and `uint16_t`
to represent unsigned 8 and 16-bit values. By convention, these should
be defined in `<stdint.h>`.
Of course, this code prints all 5 digits of *n* instead of only
printing the significant digits, but this can be fixed in many ways. The
following fix is perhaps the cleanest, making effective use of partial
results as soon as those results can be used to predict that digits will
be zero. Even so, adding these features to the code obscures the
computation being performed.

void putdec( uint16_t n ) { uint8_t d4, d3, d2, d1, q; uint16_t d0; d0 = n & 0xF; d1 = (n>>4) & 0xF; d2 = (n>>8) & 0xF; d3 = (n>>12); d0 = 6*(d3 + d2 + d1) + d0; q = d0 / 10; d0 = d0 % 10; d1 = q + 9*d3 + 5*d2 + d1; if (d1 != 0) { q = d1 / 10; d1 = d1 % 10; d2 = q + 2*d2; if ((d2 != 0) || (d3 != 0)) { q = d2 / 10; d2 = d2 % 10; d3 = q + 4*d3; if (d3 != 0) { q = d3 / 10; d3 = d3 % 10; d4 = q; if (d4 != 0) { putchar( d4 + '0' ); } putchar( d3 + '0' ); } putchar( d2 + '0' ); } putchar( d1 + '0' ); } putchar( d0 + '0' ); }

To print signed values, we can replace the first few lines of the above code with the following:

void putdec( int16_t n ) { uint8_t d4, d3, d2, d1, d0, q; if (n < 0) { putchar( '-' ); n = -n; } d0 = n & 0xF; d1 = (n>>4) & 0xF; d2 = (n>>8) & 0xF; d3 = (n>>12) & 0xF; /* the remainder of the code is unchanged */

In the above, we use the standard C type `int16_t`
to represent signed 16-bit values and `uint8_t`
to represent unsigned 8-bit values. By convention, these should
be defined in `<stdint.h>`.
As mentioned earlier, forcing *n* positive prior to breaking it down
into nibbles allows us to use an 8-bit type for `d0` instead
of the 16-bit type used previously (only 9 bits were actually needed).

Note that the computation of `d3` has also changed as a result of
the change from unsigned to signed arithmetic. This is because the
right-shift operator shifts zeros into the high bit when applied to
unsigned numbers, but it replicates the sign bit when applied to signed
numbers. There is one case in 16 bit 2's complement arithmetic where
both *n* and -*n* are negative, that is, *n* = -32768.
The change we have made gives us the correct output in this case.

In the code given above for unsigned printing, the fact that
*a*_{0} could be as large as 285 forced us to declare the
variable `d0` as a 16-bit integer instead of just 8 bits.
We can avoid this by observing that the final addition in the sum
used to compute *a*_{0} is the only one that will push
this sum over 255, and therefore, we need only check the carry bit once
to detect overflow. Having detected overflow, we can account for it in
the computation that follows:

void putdec( uint16_t n ) { uint8_t d4, d3, d2, d1, d0, q; d0 = n & 0xF; d1 = (n>>4) & 0xF; d2 = (n>>8) & 0xF; d3 = (n>>12) & 0xF; if ( (6*(d3 + d2 + d1) + d0) > 255 ) { /* overflow */ d0 = (6*(d3 + d2 + d1) + d0) & 0xFF; q = d0/10 + 25; d0 = d0 % 10 + 6; if (d0 >= 10) { d0 = d0 - 10; q = q + 1; } } else { /* no overflow */ d0 = 6*(d3 + d2 + d1) + d0; q = d0 / 10; d0 = d0 % 10; }

In the above code, after having noted an overflow in the computation of
*a*_{0}, we retain the least significant bits of the sum
in `d0` and then approximately correct the quotient by
adding 256/10, or 25; correcting this and computing the correct
remainder is a bit more interesting.

Given some integer *i* *>* 256, if we want to compute
*i* mod 10 from (*i* - 256) mod 10, we need to note that
the modulus operator is distributive in an interesting sense:

(

i+j) modm= ((imodm) + (jmodm)) modm

Using this, we can conclude that:

imod 10 = (((i- 256) mod 10) + (256 mod 10)) mod 10

= (((i- 256) mod 10) + 6) mod 10

In the case where ((*i* - 256) mod 10) + 6 is greater than 10,
the truncation of the quotient was also incorrect, so we must add 1
to the approximate quotient as well. This justifies the code given above,
and this code has also been tested exhaustively for all integers from
0 to 65535.

If flat-out speed is not of the essence, the easiest way to do the required divisions is to use the trivial repeated subtraction algorithm. The largest dividend we must consider is only 285, so it will take no more than 28 iterations for this division.

/* quotient and remainder returned by div10 */ unsigned char q, r; void div10( uint8_t i ) { q = 0; r = i; while (r > 9) { q = q + 1; r = r - 10; } }

This can be optimized in several ways. Most notably, termination conditions that compare with zero are almost always significantly faster than those that compare with other constants, so we might write something like this:

/* quotient and remainder returned by div10 */ uint8_t q, r; void div10( uint8_t i ) { q = -1; r = i; do { q = q + 1; r = r - 10; } while (r >= 0); r = r + 10; }

Another section of this tutorial covers the problem of doing multiplicaton and division using only shift and add instructions. We use the results presented there, taking specific advantage of the limited ranges of values occupied by our dividends, to completely eliminate iteration from our code.

There are two ways to get an exact quotient and remainder from
*i*/10 using reciprocal multiplication in 8 bits.
Either we multiply by 0.00011001 to get an approximate quotient and then
compute the remainder and correct the quotient, or we multiply by
0.00011001101 to get an exact quotient from which we can compute the
remainder.

The former method, yielding an approximation, can be reduced to the following C code:

/* quotient and remainder returned by div10 */ uint8_t q, r; void div10( uint8_t i ) { /* approximate the quotient as q = i*0.000110011 (binary) */ q = ((i>>1) + i) >> 1; /* times 0.11 */ q = ((q>>4) + q) >> 3; /* times 0.000110011 */ /* compute the reimainder as r = i - 10*q */ r = ((q<<2) + q) << 1; /* times 1010. */ r = i - r; /* fixup if approximate remainder out of bounds */ if (r >= 10) { r = r - 10; q = q + 1; } }

In first two right-shift-and-add statements used above, we have assumed that the carry out of the high bit of the add operation is shifted into the high bit of the second of the two right-shift operations used in each statement. This can be done efficiently in many machine instruction sets, but even good optimizing compilers may insist on using a 16 bit accumulator for all intermediate results of arithmetic operations.

The second approach, using a more elaborate multiplier to give an exact quotient, may be coded as follows:

/* quotient and remainder returned by div10 */ uint8_t q, r; void div10( uint8_t i ) { /* approximate the quotient as q = i*0.00011001101 (binary) */ q = ((i>>2) + i) >> 1; /* times 0.101 */ q = ((q ) + i) >> 1; /* times 0.1101 */ q = ((q>>2) + i) >> 1; /* times 0.1001101 */ q = ((q ) + i) >> 4; /* times 0.00011001101 */ /* compute the reimainder as r = i - 10*q */ r = ((q<<2) + q) << 1; /* times 1010. */ r = i - r; }

The particular choice of one or the other of these schemes can only be made after compiling them or hand translation to machine code. Depending on the specific features offered by the target architecture, either of the above may be the fastest or most compact, and careful use of small optimizations may tilt the balance one way or the other.

Depending on the space-time tradeoffs of the application, the above
code may be expanded in-line as an open macro, or it may be called as a
closed function. If in-line expansion is appropriate, it should be noted
that the computations of *c*_{3} and *c*_{4}
involve dividends no greater than 65. For dividends up to 68, the
multiplier 0.0001101 gives exact results, suggesting the following version
of the print routine:

void putdec( int16_t n ) { uint8_t d4, d3, d2, d1, d0, q; if (n < 0) { putchar( '-' ); n = -n; } d1 = (n>>4) & 0xF; d2 = (n>>8) & 0xF; d3 = (n>>12) & 0xF; d0 = 6*(d3 + d2 + d1) + (n & 0xF); q = (d0 * 0xCD) >> 11; d0 = d0 - 10*q; d1 = q + 9*d3 + 5*d2 + d1; q = (d1 * 0xCD) >> 11; d1 = d1 - 10*q; d2 = q + 2*d2; q = (d2 * 0x1A) >> 8; d2 = d2 - 10*q; d3 = q + 4*d3; d4 = (d3 * 0x1A) >> 8; d3 = d3 - 10*d4; putchar( d4 + '0' ); putchar( d3 + '0' ); putchar( d2 + '0' ); putchar( d1 + '0' ); putchar( d0 + '0' ); }

All multiplications in the above code can be reduced to short sequences of shift and add instructions, and it is straightforward to add the code discussed earlier to suppress leading zeros from this code.

All of these details are incorporated into the attached decimal print routine for the 14-bit family of PIC microcontrollers.

Note:The following section was added between August 6 and 8, 2012.

All of the above work was based on dividing the 16-bit number to be printed into four 4-bit chunks. That is, we worked in radix 16. In fact, there is no reason to work in a fixed radix. Consider, for example, breaking a 16-bit number into 4 chunks as follows:

n |_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _| |_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_| | n | n | n | n | 3 2 1 0

n= 8192n_{3}+ 512n_{2}+ 32n_{1}+ 1n_{0}

This leads to the following initial approximations for the decimal digits of the number:

a_{3}= 8n_{3}a_{2}= 1n_{3}+ 5n_{2}a_{1}= 9n_{3}+ 1n_{2}+ 3n_{1}a_{0}= 2n_{3}+ 2n_{2}+ 2n_{1}+ 1n_{0}

Shifting to a mixed radix system instead of pure base 16 has the net effect of changing the range of values in each of initial approximations for the decimal digits:

0

<a_{3}<65

0<a_{2}<95

0<a_{1}<133

0<a_{0}<105

Enlarging *n*_{0} to 5 bits had no major effect because the
multiplier on that term is 1. Shifting the other terms over one place
changed 3 multiplers from 6 to 12, resulting in smaller terms.
As a result, 8-bit arithmetic now suffices for all of the computations.
Reducing this to C code, we get:

void putdec( uint16_t n ) { uint8_t d4, d3, d2, d1, d0, q; d0 = n & 0x1F; d1 = (n>>5) & 0xF; d2 = (n>>9) & 0xF; d3 = (n>>13) & 0x7; d0 = 2*(d3 + d2 + d1) + d0; q = (d0 * (uint16_t)0x67) >> 10; d0 = d0 - 10*q; d1 = 9*d3 + d2 + 3*d1 + q; q = (d1 * (uint16_t)0x67) >> 10; d1 = d1 - 10*q; d2 = d3 + 5*d2 + q; q = (d2 * (uint16_t)0x67) >> 10; d2 = d2 - 10*q; d3 = 8*d3 + q; q = (d3 * (uint16_t)0x1A) >> 8; d3 = d3 - 10*q; putchar(q + '0'); putchar(d3 + '0'); putchar(d2 + '0'); putchar(d1 + '0'); putchar(d0 + '0'); }

Enlarging *n*_{0} to 5 bits had a second consequence. Because
the upper bounds on the larger terms have been reduced, shortened
approximations for one tenth can be used. On small
microcontrollers without hardware support for multiplication, this can
result in savings of a few instructions per multiply.
The multiplier 67_{16} works because 0.0001100111 approximates one
tenth to sufficient precision to divide any number less than 179 by ten.
The final digit, `d3`, can still be computed
using the even shorter approximation we used previously.

Note:The following section was added on June 8, 2010. As of June 10, the constants given in this section and the code at the end have been checked and appear to be correct. As of August 13, this code has undergone extensive testing.

The techniques explored above are applicable wherever a large integer must be converted to decimal on a machine with a smaller ALU. For example, consider the problem of converting 64-bit integers to decimal on a machine with a 32-bit ALU. This practical problem comes up, for example, in the BIOS code for Intel-architecture machines, where the code must not assume a 64-bit CPU but must still be able to report 64-bit values when it reports on such things as the sizes of the attached disk drives. When 64-bit arithmetic is supported on a 32-bit machine, it is slow enough that significant speedups can be achieved by avoiding it.

In general, 64-bit values may have up to 20 decimal digits:

2

^{64}= 18,446,744,073,709,551,616

Normally, we group decimal digits in groups of 3 using commas, as illustrated above. This is equivalent to writing the number in base 1000, where each base 1000 digit is represented with 3 consecutive decimal digits. Because there are 20 digits, grouping them as 5 blocks of 4 makes sense, suggesting that we should work in base 10,000. In addition, 10,000 is the largest power of 10 that can be represented in 16 bits (half the precision of our ALU). Therefore, we will write:

2

^{64}= 1844,6744,0737,0955,1616

We begin by breaking our 64-bit number into 16-bit chunks, just as we
began the 16-bit conversion by breaking the number into 4-bit chunks.
Again, we will the number *n* and we will call the 16-bit chunks
*n*_{3}, *n*_{2}, *n*_{1}
and *n*_{0}.

n |___ ___ ___ ___ ___ ___ ___ ___| 64-bits |___|___|___|___|___|___|___|___| 8 bytes | n | n | n | n | 4 16-bit chunks 3 2 1 0

The value of the number can be expressed in terms of these 4 fields as follows:

n= 2^{48}n_{3}+ 2^{32}n_{2}+ 2^{16}n_{1}+ 2^{0}n_{0}

n=

281,474,976,710,656n_{3}+

4,294,967,296n_{2}+

65,536n_{1}+

1n_{0}

Because we are interested in working in base 10,000, we will regroup the digits above and continue with the following version:

n=

281,4749,7671,0656n_{3}+

42,9496,7296n_{2}+

6,5536n_{1}+

1n_{0}

Our goal is to express *n* in base 10,000 as
*d*_{4}*d*_{3}*d*_{2}*d*_{1}*d*_{0}.
If each of *d*_{4}, *d*_{3}, *d*_{2},
*d*_{1} and *d*_{0} is a single base 10,000
digit, the value of n can be expressed as:

n=

1,0000,0000,0000,0000d_{4}+

1,0000,0000,0000d_{3}+

1,0000,0000d_{2}+

1,0000d_{1}+

1d_{0}

Our initial approximations of the decimal digits comes from rearranging the base 10,000 digits of the multipliers for the 16-bit chunks to conform with the above decimal presentation:

n=

1,0000,0000,0000 ( 281n_{3}) +

1,0000,0000 ( 4749n_{3}+ 42n_{2}) +

1,0000 ( 7671n_{3}+ 9496n_{2}+ 6n_{1}) +

1 ( 0656n_{3}+ 7296n_{2}+ 5536n_{1}+ 1n_{0})

The approximations for the four least significant base 10,000 digits fall out of the above:

a_{3}= 281n_{3}a_{2}= 4749n_{3}+ 42n_{2}a_{1}= 7671n_{3}+ 9496n_{2}+ 6n_{1}a_{0}= 0656n_{3}+ 7296n_{2}+ 5536n_{1}+ 1n_{0}

At this point, just as we did with the 4-bit example, we should check the
range of values we might encounter in each approximation. Here, our
concern is that some of the approximations might exceed the capacity of a
32-bit ALU. The maximum value of each term will occur when all of the
*n*_{i} are at their maximum value, which is 65,535.

0

<a_{3}<18,415,335

0<a_{2}<313,978,185

0<a_{1}<1,125,432,555

0<a_{0}<884,001,615

All of our approximations are comfortably less than 2 billion, which means that we can safely use 32-bit signed division, if that is all that is available to us.

We are now ready to compute the final values of the digits by a series of 32-bit divisions, as follows:

c_{1}=a_{0}/ 10,000d_{0}=a_{0}mod 10,000

c_{2}= (a_{1}+c_{1}) / 10,000d_{1}= (a_{1}+c_{1}) mod 10,000

c_{3}= (a_{2}+c_{2}) / 10,000d_{2}= (a_{2}+c_{2}) mod 10,000

c_{4}= (a_{3}+c_{3}) / 10,000d_{3}= (a_{3}+c_{3}) mod 10,000

d_{4}=c_{4}

In the above, the *c*_{i} terms represent carries propagated
from one digit position to the next. The upper bounds on each of the
*c*_{i} can be computed from the bounds given above for the
*a*_{i}:

c_{1}<88,400 [ =a_{0}/ 10,000 = 884,001,615 / 10,000 ]c_{2}<112,552 [ = (a_{1}+c_{1}) / 10,000 = 1,125,520,955 / 10,000 ]c_{3}<31,409 [ = (a_{2}+c_{2}) / 10,000 = 314,090,737 / 10,000 ]c_{4}<1,844 [ = (a_{3}+c_{3}) / 10,000 = 18,446,744 / 10,000 ]

Note something encouraging: The largest intermediate value encountered above is 1,125,520,955, which is safely less than 2 billion and therefore safe to represent as either a signed or unsigned integer on a 32-bit machine.

Note also that the bound on *c*_{4} is a small but useful
check on our arithmetic. The value, 1,844, is identical to the first 4 decimal
digits of 2**64, which is exactly what we expect. This does not eliminate
the possibility that the above might contain many small clerical errors.

The code to print the 64-bit number ends up essentially identical to the code given previouly to print a 16-bit number, except for the sizes of the variables, the values of the various constants, and the code to print each digit of the result. If we have not made any errors in computing the various constants, the following C code should work:

void putdec( uint64_t n ) { uint32_t d4, d3, d2, d1, d0, q; d0 = n & 0xFFFF; d1 = (n>>16) & 0xFFFF; d2 = (n>>32) & 0xFFFF; d3 = (n>>48) & 0xFFFF; d0 = 656 * d3 + 7296 * d2 + 5536 * d1 + d0; q = d0 / 10000; d0 = d0 % 10000; d1 = q + 7671 * d3 + 9496 * d2 + 6 * d1; q = d1 / 10000; d1 = d1 % 10000; d2 = q + 4749 * d3 + 42 * d2; q = d2 / 10000; d2 = d2 % 10000; d3 = q + 281 * d3; q = d3 / 10000; d3 = d3 % 10000; d4 = q; printf( "%4.4u", d4 ); printf( "%4.4u", d3 ); printf( "%4.4u", d2 ); printf( "%4.4u", d1 ); printf( "%4.4u", d0 ); }

In the above code, we used the standard types
`uint32_t` and `uint64_t` to declare
32 and 64-bit unsigned integer variables. By convention, these should
be defined in `<stdint.h>`. Note that
if you are working on a 32-bit machine with no support for 64-bit arithemtic,
there is no guarantee that you will be able to declare the 64-bit type
and you may have to fake it, for example, by representing each 64-bit
value as a pair of 32-bit components.

This code uses `printf()` to print each 4-digit base 10,000 digit
of the result. Of course, this code prints all 20 digits of *n*.
Suppressing leading zeros takes
a bit more work than it did with the 4-bit nibble code because of the
complexity of suppressing leading zeros within the printed representation
of the first nonzero base 10,000 digit.

I don't know where I first encountered the basic idea for the 8-bit code presented here. This tutorial began as an attempt to reinvent that code after I found myself working on a machine with an 8-bit ALU and needing to do 16-bit conversions.

Thanks to Mark Ramsey, who, on January 22, 2007 pointed out two small typos with big consequences in the section on

Living Within 8 Bits.Thanks to Dennis Vlasenko, who, on June 15, 2007 pointed out more typos in the section on

Doing Without Multiply and Divide, and for his work integrate the 16-bit code into the Linux distribution ofvsprintf().Thanks to Joe Zbiciak, who, on August 23, 2007 pointed out a small error the introduction.

Thanks to Derick Moore, who, on June 10, 2010, reported that he had checked the constants and tested the code in the section on

64-bit conversion on a 32-bit machineand verified that it worked in the BIOS he was maintaining.Thanks to Michal Nazarewicz for additional testing reported on August 13, 2010 and for his work to integrate the 64-bit code into the Linux distribution of

vsprintf().Thanks to George Spelvin, who on August 3, 2012, suggested that there is no reason that all of the chunks should be the same size, leading to the addition of the section on other radices.

Thanks to Ralph Corderoy, who on Septeber 10, 2014, pointed out the weakness of the repeated subtraction code when presented with the value -32768.