William Dally's Profile | Stanford Profiles

Logic Simulation Algorithms for Pipelined Hardware Architectures Hardware Accelerators for Electrical CAD Agrawal, P., Dally, W. J., Tutundjian, R. edited by Ambler, T., Agrawal, P. 1988.

Program Chair’s Message Dally, W. J.

The Reconﬁgurable Arithmetic Processor Fiske, S., Dally, W. J.

Message-Driven Processor Architecture: Verson 11 Dally, W. J., Chien, A., Fiske, S., Horwat, W., Keen, J., Nuth, P.

Stanford University Concurrent VLSI Architecture Memo 124 Elastic Buffer Networks-on-Chip Michelogiannakis, G., Balfour, J., Dally, W. J.

Spills, Fills, and Kills Erez, M., Towles, B. P., Dally, W. J.

Conference Author/Panelist Index Dally, W. J., Aoki, N., Bai, X., Banerjee, K., Benini, L., Bergamaschi, R.

SSCS Members Honored as 2002 IEEE Fellows Banu, M., Burghartz, J. N., Dally, W. J., Dean, M. E., Gielen, G. G., Griffin, E. L.

IEEE MICRO 1998 ANNUAL INDEX, VOL. 18 Burns Dally, W. J., Adams, J., Alt, P. M., Arai, T., Arakawa, F., Avresky, D. R. ; 66: 79

CIMI FÍITIIt Dally, W. J., Balfour, J., Black-Shaffer, D., Chen, J., Harting, R. C., Parikb, V.

AI Memo No. 1272 April 26, 1994 Spertus, E., Dally, W. J.

ISSCC 2004/SESSION 7/TD: SCALING TRENDS/7.1 Horowitz, M., Dally, W.

ARVLSI’97 Committees Dally, W. J., Brown, R. B., Ishii, A. T., Papaefthymiou, M. C., Mudge, T. N., June, C. S.

ISSCC 2007/SESSION 24/MULTI-GB/s TRANSCEIVERS/24.3 Palmer, R., Poulton, J., Dally, W. J., Eyles, J., Fuller, A. M., Greer, T.

Globally Adaptive Load-Balanced Routing on k-ary n-cubes Singh, A., Dally, W. J., Towles, B., Gupta, A. K.

IEEE Fellows Lead the Engineering Profession Dally, W. J., Agha, G. A., Babic, H. I., Basu, S., Beausoleil, W. F., Bertino, E.

1987 INDEX, VOLUME 4 Dally, W. J., Agrawal, P.

6 Guest Editors’ Introduction: Top Picks from the 2008 Computer Architecture Conferences Joel Emer and Dean Tullsen 10 Larrabee: A Many-Core x86 Architecture Dally, W. J., Seiler, L., Carmean, D., Sprangle, E., Forsyth, T., Abrash, M.

2010 Reviewers List Dally, W. J., Acacio, M. E., Agrawal, N., Altman, E., Alur, R., Baas, B.

5 Guest Editors’ Introduction: Hot Chips 21 Krste Asanovic and Ralph Wittig 7 Power7: IBM’s Next-Generation Server Processor Dally, W. J., Kalla, R., Sinharoy, B., Starke, W. J., Floyd, M., Conway, P.

31st Annual International Symposium on Computer Architecture ISCA 2004 Dally, W. J., Agerwala, T., Taylor, M., Lee, W., Miller, J., Wentzlaff, D.

21st century digital design tools Dally, W. J., Malachowsky, C., Keckler, S. W. 2013

A 0.54 pJ/b 20Gb/s ground-referenced single-ended short-haul serial link in 28nm CMOS for advanced packaging applications Solid-State Circuits Conference Digest of Technical Papers (ISSCC) Poulton, J. W., Dally, W. J., Chen, X., Eyles, J. G., Greer, T. H., Tell, S. G. 2013

A detailed and flexible cycle-accurate network-on-chip simulator Performance Analysis of Systems and Software (ISPASS) Jiang, N., Becker, D. U., Michelogiannakis, G., Balfour, J., Towles, B., Shaw, D. E. 2013

A 0.54 pJ/b 20 Gb/s Ground-Referenced Single-Ended Short-Reach Serial Link in 28 nm CMOS for Advanced Packaging Applications IEEE Poulton, J. W., Dally, W. J., Chen, X., Eyles, J. G., Greer, T. H., Tell, S. G. 2013

Composition and reuse with compiled domain-specific languages Dally, W. J., Sujeeth, A. K., Rompf, T., Brown, K. J., Lee, H., Chafi, H. 2013

Optimizing data structures in high-level programs: new directions for extensible compilers based on staging Rompf, T., Sujeeth, A. K., Amin, N., Brown, K. J., Jovanovic, V., Lee, H. 2013

Channel reservation protocol for over-subscribed channels and destinations Michelogiannakis, G., Jiang, N., Becker, D., Dally, W. J. 2013

Article 8-A Hierarchical Thread Scheduler and Register File for Energy-Efficient Throughput Processors ACM Transactions on Computer Systems-TOCS Gebhart, M., Johnson, D. R., Tarjan, D., Keckler, S. W., Dally, W. J., Lindholm, E. 2012; 2 (30): 38

A case of system-level hardware/software co-design and co-verification of a commodity multi-processor system with custom hardware Dally, W. J., Hong, S., Oguntebi, T., Casper, J., Bronson, N., Kozyrakis, C. 2012

Digital Design: A Systems Approach Dally, W. J., Harting, R. C. Cambridge University Press. 2012

Green-Marl: A DSL for Easy and Efficient Graph Analysis ASPLOS XVII: SEVENTEENTH INTERNATIONAL CONFERENCE ON ARCHITECTURAL SUPPORT FOR PROGRAMMING LANGUAGES AND OPERATING SYSTEMS Hong, S., Chafi, H., Sedlar, E., Olukotun, K. 2012: 349-362

View details for Web of Science ID 000304281900029

Unifying primary cache, scratch, and register file memories in a throughput processor Gebhart, M., Keckler, S. W., Khailany, B., Krashinsky, R., Dally, W. J. 2012

4 Guest Editor’s Introduction: CPUs, GPUs, and Hybrid Computing David Brooks 7 GPUs and the Future of Parallel Computing Keckler, S. W., Dally, W. J., Khailany, B., Garland, M., Glasco, D., Rohr, D. 2011

Guaranteeing forward progress of unified register allocation and instruction scheduling Technical Report Concurrent VLSI Architecture Group Memo 127, Stanford Park, J., Dally, W. J. 2011

Gpus and the future of parallel computing Micro, IEEE Keckler, S. W., Dally, W. J., Khailany, B., Garland, M., Glasco, D. 2011; 5 (31): 7-17

Energy-efficient mechanisms for managing thread context in throughput processors ACM SIGARCH Computer Architecture News Gebhart, M., Johnson, D. R., Tarjan, D., Keckler, S. W., Dally, W. J., Lindholm, E. 2011; 3 (39): 235-246

2011 Index IEEE Computer Architecture Letters Vol. 10 Computer Architecture Letters Becker, D., Choi, I., Cooper-Balis, E., Dally, W. J., Devadas, S., Duato, J. 2011; 53: 56

Circuit challenges for future computing systems Dally, W. J. 2011

Liszt: a domain specific language for building portable mesh-based PDE solvers DeVito, Z., Joubert, N., Palacios, F., Oakley, S., Medina, M., Barrientos, M. 2011

A compile-time managed multi-level register file hierarchy Gebhart, M., Keckler, S. W., Dally, W. J. 2011

2010 IEEE Symposium on Asynchronous Circuits and Systems Dally, W. J., Tell, S. G. 2010

Throughput computing Dally, W. J. 2010

Evaluating bufferless flow control for on-chip networks Michelogiannakis, G., Sanchez, D., Dally, W. J., Kozyrakis, C. 2010

The even/odd synchronizer: A fast, all-digital, periodic synchronizer Asynchronous Circuits and Systems (ASYNC), 2010 IEEE Symposium on Dally, W. J., Tell, S. G. 2010: 75-84

Moving the needle, computer architecture research in academe and industry ACM SIGARCH Computer Architecture News Dally, W. J. 2010; 3 (38): 1-1

Booksim 2.0 User’s Guide Standford University Jiang, N., Michelogiannakis, G., Becker, D., Towles, B., Dally, W. J. 2010

Fine-grain dynamic instruction placement for L0 scratch-pad memory Park, J., Balfour, J., Dally, W. J. 2010

Block-Parallel Programming for Real-time Embedded Applications WJ 2010

View details for DOI D

Apparatus and method for packet scheduling US Patent Dally, W. J., Carvey, P. P., Beliveau, P. A., Mann, W. F., Dennison, L. R. 2010; 760 (7): 747

The GPU Computing Era (HTML) Nickolls, J., Dally, W. J. 2010

The GPU computing era Micro, IEEE Nickolls, J., Dally, W. J. 2010; 2 (30): 56-69

The end of denial architecture and the rise of throughput computing Keynote speech at Desgin Automation Conference Dally, W. J. 2010

The end of denial architecture and the rise of throughput computing Dally, W. J. 2010

Exascale software study: Software challenges in extreme scale systems DARPA IPTO, Air Force Research Labs Amarasinghe, S., Campbell, D., Carlson, W., Chien, A., Dally, W., Elnohazy, E. 2009

Indirect adaptive routing on large scale interconnection networks ACM SIGARCH Computer Architecture News Jiang, N., Kim, J., Dally, W. J. 2009; 3 (37): 220-231

Router designs for elastic buffer on-chip networks Michelogiannakis, G., Dally, W. J. 2009

Power efficient supercomputing Accelerator-based Computing and Manycore Workshop (presentation) Dally, W. J. 2009; 1

Allocator implementations for network-on-chip routers Becker, D. U., Dally, J. J. 2009

Maximizing the Filter Rate of L0 Compiler-Managed Instruction Stores by Pinning Technical Report 126, Concurrent VLSI Architecture Group, Stanford University Park, J., Balfour, J., Dally, W. J. 2009

Stream Processors Multicore Processors and Systems Erez, M., Dally, W. J. 2009: 231-270

Load-balanced routing US Patent Singh, A., Dally, W. J. 2009; 633 (7): 940

Embracing heterogeneity–parallel programming for changing hardware Linderman, M. D., Balfour, J., Meng, T. H., Dally, W. J. 2009

Elastic-buffer flow control for on-chip networks High Performance Computer Architecture Michelogiannakis, G., Balfour, J., Dally, W. J. 2009

Hierarchical instruction register organization Computer Architecture Letters Black-Schaffer, D., Balfour, J., Dally, W., Parikh, V., Park, J. S. 2008; 2 (7): 41-44

A tuning framework for software-managed memory hierarchies Ren, M., Park, J. Y., Houston, M., Aiken, A., Dally, W. J. 2008

An energy-efficient processor architecture for embedded systems Computer Architecture Letters Balfour, J., Dally, W. J., Black-Schaffer, D., Parikh, V., Park, J. S. 2008; 1 (7): 29-32

Exascale computing study: Technology challenges in achieving exascale systems Kogge, P., Bergman, K., Borkar, S., Campbell, D., Carson, W., Dally, W. 2008

A programmable 512 GOPS stream processor for signal, image, and video processing Solid-State Circuits, IEEE Journal Khailany, B. K., Williams, T., Lin, J., Long, E. P., Rygh, M., Tovey, D. F., Dally, B. 2008; 1 (43): 202-213

Structured Application-Specific Integrated Circuit (ASIC) Study STANFORD UNIV CA COMPUTER SYSTEMS LAB Dally, W., Balfour, J., Black-Schaffer, D., Hartke, P. 2008

Exascale computing study: Technology challenges in achieving exascale systems Bergman, K., Borkar, S., Campbell, D., Carlson, W., Dally, W., Denneau, M. 2008

Flattened butterfly: a cost-efficient topology for high-radix networks ACM SIGARCH Computer Architecture News Kim, J., Dally, W. J., Abts, D. 2007; 2 (35): 126-137

Research Challenges for On-Chip Interconnection Networks (HTML) Owens, J. D., Dally, W. J., Ho, R., Jayasimha, D. N., Keckler, S. W., Peh, L. S. 2007

Executing irregular scientific applications on stream architectures Erez, M., Ahn, J. H., Gummaraju, J., Rosenblum, M., Dally, W. J. 2007

A 14mW 6.25 Gb/s transceiver in 90nm CMOS for serial chip-to-chip communications Palmer, R., Poulton, J., Dally, W. J., Eyles, J., Fuller, A. M., Greer, T. 2007

Architectural support for the stream execution model on general-purpose processors Gummaraju, J., Erez, M., Coburn, J., Rosenblum, M., Dally, W. J. 2007

Stream Scheduling: A Framework to Manage Bulk Operations in a Memory Hierarchy Parallel Architecture and Compilation Techniques Das, A., Dally, W. J. 2007

Interconnect-Centric Computing. HPCA Dally, W. J., Keynote, H. 2007; 1

Tradeoff between data-, instruction-, and thread-level parallelism in stream processors Ahn, J., Erez, M., Dally, W. J. 2007

Future directions for on-chip interconnection networks OCIN Workshop Dally, W. J. 2006

Sequoia: programming the memory hierarchy Fatahalian, K., Horn, D., Knight, T., Leem, L., Houston, M., Park, J., Dally, B. 2006

Multi-Core for HPC: Breakthrough or Breakdown? Sterling, T., Kogge, P., Dally, W., Scott, S., Gropp, W., Keyes, D. 2006

Topology optimization of interconnection networks Computer Architecture Letters Gupta, A. K., Dally, W. J. 2006; 1 (5): 10-13

Prefix search method US Patent Waters, G. M., Dennison, L. R., Carvey, P. P., Dally, W. J., Mann, W. F. 2006; 130 (7): 847

DRAFT Final Report: Workshop on On-and Off-Chip Networks for Multi-Core Systems Capturado em: http://www. ece. ucdavis. edu/~ ocin06 Dally, W. 2006

Compiling for stream processing Das, A., Dally, W. J., Mattson, P. 2006

Data parallel address architecture Computer Architecture Letters Ahn, J. H., Dally, W. J. 2006; 1 (5): 30-33

Adaptive routing in high-radix clos network Kim, J., Dally, W. J., Dally, J., Abts, D. 2006

Pulsenet-A Parallel Flash Sampler and Digital Processor IC for Optical SETI Custom Integrated Circuits Conference, 2006. CICC'06. IEEE Howard, A. W., Wei, G. Y., Dally, W. J., Horowitz, P. 2006: 261-264

Design tradeoffs for tiled CMP on-chip networks Balfour, J., Dally, W., J. 2006

The design space of data-parallel memory systems Ahn, J. H., Erez, M., Dally, W. J. 2006

Fault tolerance techniques for the merrimac streaming supercomputer Erez, M., Jayasena, N., Knight, T. J., Dally, W. J. 2005

11th International Symposium on High-Performance Computer Architecture (HPCA'05) Ahn, J. H., Erez, M., Dally, W. J. 2005

Globally adaptive load-balanced routing on tori Computer Architecture Letters Singh, A., Dally, W. J., Towles, B., Gupta, A. K. 2004; 1 (3): 2-2

Streams and vectors: A memory system perspective 6th WorkShop on Media and Streaming Processors Jayasena, N., Dally, W. J. 2004

High-Speed Logic, Circuits, Libraries and Layout Closing the Gap Between ASIC & Custom Chang, A., Dally, W. J., Chinnery, D., Keutzer, K., Zlatanovici, R. 2004: 101-144

The case for broader computer architecture education: keynote address Dally, W. J. 2004

Buffer and delay bounds in high radix interconnection networks Computer Architecture Letters Singh, A., Dally, W. J. 2004; 1 (3): 8-8

Adaptive channel queue routing on k-ary n-cubes Singh, A., Dally, W., J., Gupta, A., Towles, B. 2004

Stream processors: Progammability and efficiency Queue Dally, W. J., Kapasi, U. J., Khailany, B., Ahn, J. H., Da, A. 2004; 1 (2): 52

Principles and practices of interconnection networks Access Online via Elsevier Dally, W. J., Towles, B. P. 2004

How scaling will change processor architecture Solid-State Circuits Conference, 2004. Digest of Technical Papers. Horowitz, M., Dally, W. 2004

Exploiting Structure and Managing Wires to Increase Density and Performance Closing the Gap Between ASIC & Custom Chang, A., Dally, W. J. 2004: 269-287

Analysis and performance results of a molecular modeling application on Merrimac Erez, M., Ahn, J. H., Garg, A., Dally, W. J., Darve, E. 2004

Space-efficient source routing Carvey, P., Dally, W., Dennison, L., King, P., Mann, W. 2004

The Ninth International Symposium on High-Performance Computer Architecture (HPCA'03) Khailany, B., Dally, W. J., Rixner, S., Kapasi, U. J., Owens, J. D., Towles, B. 2003

Merrimac: Supercomputing with streams Dally, W., J., Labonte, F., Das, A., Hanrahan, P., Ahn, J. H., Gummaraju, J. 2003

Prefix search method Carvey, P., Carvey, P., Dennison, L., Mann, W., Waters, G. 2003

A second-order semi-digital clock recovery circuit based on injection locking Solid-State Circuits Conference, 2003. Digest of Technical Papers. ISSCC Lee, M. J., Dally, W. J., Poulton, J., Greer, T., Edmondson, J., Farjad-Rad, R. 2003

A 33mW 8Gb/s CMOS clock multiplier and CDR for highly integrated I/Os Ng, H. T., Lee, M. J., Farjad-Rad, R., Senthinathan, R., Dally, W. J., Nguyen, A. 2003

Methods and apparatus for event-driven routing Carvey, P., Dally, W., Dennison, L., King, P. 2003

0.622-8.0 Gbps 150 mW serial IO macrocell with fully flexible preemphasis and equalization VLSI Circuits, 2003. Digest of Technical Papers. 2003 Symposium on Farjad-Rad, R., Ng, H. T., Lee, M. J., Senthinathan, R., Dally, W. J., Nguyen, A. 2003: 63-66

Throughput-centric routing algorithm design Towles, B., Dally, W. J., Boyd, S. 2003

CMOS high-speed I/Os-present and future Lee, M. J., Dally, W. J., Farjad-Rad, R., Ng, H. T., Senthinathan, R., Edmondson, J. 2003

Migration in single chip multiprocessors Computer Architecture Letters Shaw, K. A., Dally, W. J. 2002; 1 (1): 12-12

Locality-preserving randomized oblivious routing on torus networks Singh, A., Dally, W. J., Towles, B., Gupta, A. K. 2002

Comparing Reyes and OpenGL on a stream architecture Owens, J. D., Khailany, B., Towles, B., Dally, W. J. 2002

Prefix search circuitry and method Carvey, P., Dally, W., Dennison, L., Mann, W., Waters, G. 2002

Internet switch router Carvey, P., Carvey, P., Dennison, L., King, P. 2002

Computer architecture is all about interconnect High-Perf. Comp. Architecture Dally, W. J. 2002

Worst-case traffic for oblivious routing functions Towles, B., Dally, W. J. 2002

Stream Processing for High-Performance Embedded Systems Defense Technical Information Center Dally, W. J. 2002

Method and system for guaranteeing quality of service in large capacity input output buffered cell switch based on minimum bandwidth guarantees and weighted fair share of unused bandwidth Dally, W., Meempat, G., Ramamurthy, G. 2002

Worst-case Traffic for Oblivious Routing Functions (PDF) Towles, B., Dally, W. J. 2002

A 0.2-2 GHz 12 mW multiplying DLL for low-jitter clock synthesis in highly-integrated data communication chips Farjad-Rad, R., Dally, W., Ng, H. T., Poulton, J., Stone, T., Rathi, R. 2002

Guest Editors' Introduction: Hot Chips 12 (HTML) Dally, W. J., Tremblay, M., Baum, A. J. 2001

Elastic interconnects: Repeater-inserted long wiring capable of compressing and decompressing data Mizuno, M., Dally, W., Onishi, H. 2001

Monolithic chaotic communications system Circuits and Systems, 2001. ISCAS 2001. The 2001 IEEE International Chiang, P., Dally, W., Lee, E. 2001

Guest Editors' Introduction: Hot Chips 12 IEEE MICRO Baum, A. J., Dally, W. J., Tremblay, M. 2001; 2 (21): 0013-15

Scalable switching fabrics for Internet routers White paper, Avici Systems Inc Dally, W. J. 2001

A streaming supercomputer Whitepaper Dally, W. J., Hanrahan, P., Fedkiw, R. 2001

A single-chip terabit switch Hot Chips Dally, W. J., Dettloff, W., Eyles, J., Greer, T., Poulton, J., Stone, T. 2001; 13

A Delay Model for Router Microarchitectures (HTML) Peh, L. S., Dally, W. J. 2001

Smart memories: A modular reconfigurable architecture ACM SIGARCH Computer Architecture News Mai, K., Paaske, T., Jayasena, N., Ho, R., Dally, W. J., Horowitz, M. 2000; 2 (28): 161-171

Flit-reservation flow control Peh, L., S., Dally, W. J. 2000

Stream Scheduling STANFORD UNIV CA COMPUTER SYSTEMS LAB Dally, W. J., Mattson, P., Kapasi, U. J., Owens, J. D., Towles, B. 2000

10 Subspace Optimizations Knobe, K., Dally, W. J. edited by Kessler, Christoph, W. 2000

Stream scheduling STANFORD UNIV CA COMPUTER SYSTEMS LAB Kapasi, U. J., Mattson, P., Dally, W. J., Owens, J. D., Towles, B. 2000

Sixth International Symposium on High-Performance Computer Architecture Peh, L. S., Dally, W. J. 2000

Memory access scheduling isca Owens, J. D., Mattson, P., Kapasi, U. J., Dally, W. J., Rixner, S. 2000; 128

Register organization for media processing Rixner, S., Dally, W., J., Khailany, B., Mattson, P., Kapasi, U. J., Owens, J. 2000

Polygon rendering on a stream architecture Owens, J. D., Dally, W. J., Kapasi, U. J., Rixner, S., Mattson, P., Mowery, B. 2000

A 90 mW 4 Gb/s equalized I/O circuit with input offset cancellation Lee, M. J., Dally, W., Chiang, P. 2000

Sixth International Symposium on High-Performance Computer Architecture Rixner, S., Dally, W. J., Khailany, B., Mattson, P., Kapasi, U. J., Owens, J. D. 2000

Computer Architecture for the Next Millenium Dally, W. J. 1999

GAD: A 12-GS/s CMOS 4-bit A/D converter for an equalized multi-level link Ellersick, W., Yang, C. K., Horowitz, M., Dally, W. J. 1999

Interconnect-limited VLSI architecture Interconnect Technology, 1999. IEEE International Conference Dally, W. J. 1999: 15-17

20th Anniversary Conference on Advanced Research in VLSI Dally, W. J., Lacy, S. 1999

Point sample rendering Massachusetts Institute of Technology Dally, W. J., Grossman, J. P. 1998

VLSI datapath choices: Cell-based versus full-custom Massachusetts Institute of Technology Chang, A. L. 1998

Tomorrow’s Computing Engines keynote speech, Fourth Int’l Symp. High-Performance Computer Architecture Dally, W. 1998

The j-machine: A retrospective Retrospective in Dally, W. J., Chang, A., Chien, A., Fiske, S., Horwat, W., Keen, J. 1998: 54-58

An efficient, protected message interface Computer Lee, W. S., Dally, W. J., Keckler, S. W., Carter, N. P., Chang, A. 1998; 11 (31): 69-75

Digital systems engineering Cambridge university press Dally, W. J., Poulton, J. W. 1998

Architecture of a message-driven processor 25 years of the international symposia on Computer architecture (selected Dally, W. J., Chao, L., Chien, A., Hassoun, S., Horwat, W., Kaplan, J. 1998

Architecture of the Avici terabit switch/router Dally, W., Carvey, P., Dennison, L. 1998

Digital Systems Engineering Poulton, J. W., Dally, J., John, W. Cambridge University Press. 1998

E cient, protected message interface in the MIT M-Machine IEEE Computer Special Issue on Design Challenges for High-Performance Lee, W. S., Dally, W. J., Keckler, S. W., Carter, N. P., Chang, A. 1998

An instruction scheduling algorithm for communication-constrained microprocessors Massachusetts Institute of Technology Dally, W. J., Buehler, C. J. 1998

The Fifth International Conference on Massively Parallel Processing Using Optical Interconnections Dally, W. J., Lee, M. J., An, F. T., Poulton, J., Tell, S. 1998

Point sample rendering Rendering Techniques Grossman, J. P., Dally, W. J. 1998; 98: 181-192

Media Processors 1999 (Proceedings Volume) Dally, W. J., Fritts, J. E., Wolf, W. H., Liu, B., Bove Jr, V. M., Lee, M. 1998

Media processing using streams Electronic Imaging Rixner, S., Dally, W. J., Kapasi, U. J., Khailany, B., Lopez-Lagunas, A., Mattson, P. R. 1998: 122-134

The J-Machine ANNUAL INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE Dally, W. J., Chang, A., Chien, A., Fiske, S., Horwat, W., Keen, J. 1998; 25: 54-58

Retrospective: the J-machine Dally, W. J., Chien, A., Fiske, S., Horwat, W., Lethin, R., Noakes, M. 1998

Invited Talks Coldren, L. A., Dally, W. J. 1998

Message-driven dynamics Massachusetts Institute of Technology Dally, W. J., Lethin, R. A. 1997

Transmitter equalization for 4-Gbps signaling Micro, IEEE Dally, W. J., Poulton, J. 1997; 1 (17): 48-56

The m-machine multicomputer International Journal of Parallel Programming Fillo, M., Keckler, S. W., Dally, W. J., Carter, N. P., Chang, A., Gurevich, Y. 1997; 3 (25): 183-212

The delta tree: An object-centered approach to image-based rendering Dally, W. J., McMillan, L., Bishop, G., Fuchs, H. 1997

Extended ephemeral logging: log storage management for applications with long lived transactions ACM Transactions on Database Systems (TODS) Keen, J. S., Dally, W. J. 1997; 1 (22): 1-42

Design of the Configuration and Diagnostic Units of the MAP Chip Massachusetts Institute of Technology Dally, W. J., Klayman, K. 1997

An I/O port controller for the MAP chip Massachusetts Institute of Technology, Dept. of Electrical Engineering and Dally, W. J., Ma, A. 1997

Asynchronous event handing Massachusetts Institute of Technology Dally, W. J., Chatterjee, S. 1997

Advances in the M-machine runtime system Massachusetts Institute of Technology Dally, W. J., Shultz, A. 1997

TPDS Now Online! z Special Issue Editors Old and New IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS Dally, W. J., Fortes, J. A. 1997; 3 (8): 225

Circuit designs for the MAP chip Massachusetts Institute of Technology Dally, W. J., Chen, A. R. 1997

1997Annual Index, Vol. 17 development [single chip microprocessors] Dally, W. J., Adams, L., Anderson, T., Bilas, A., Biswas, B. B., Burger, D. 1997; 2000: 28-36

Flexible Memory Systems.(AASERT Fellowship). MASSACHUSETTS INST OF TECH CAMBRIDGE Carter, N., Dally, W. J. 1996

The subspace model: Shape-based compilation for parallel systems Massachusetts Institute of Technology Dally, W. J., Knobe, K. B. 1996

Architects Look to Processors of Future MICROPROCESSOR REPORT, MICRODESIGN RESOURCES Bell, G., Sites, R., Dally, W., Ditzel, D., Patt, Y. 1996; 10 (10)

Multiprocessor coupling system with integrated compile and run time scheduling for parallelism US Patent Keckler, S. W., Dally, W. J. 1996; 574 (5): 939

Bandwidth, Granularity, and Mechanisms: Key Issues in the Design of Parallel Computers Dally, W. J. 1996

Flexible Memory Systems.(AASERT Fellowship) MASSACHUSETTS INST OF TECH CAMBRIDGE Dally, W. J., Carter, N. 1996

A data-driven IDCT architecture for low power video applications Xanthopoulos, T., Chandrakasan, A. P., Sodini, C. G., Dally, W. J. 1996

Evaluating the locality benefits of active messages ACM SIGPLAN Notices Spertus, E., Dally, W. J. 1995; 8 (30): 189-198

Thread prioritization: A thread scheduling mechanism for multiple-context parallel processors Future Generation Computer Systems Fiske, S., Dally, W. J. 1995; 6 (11): 503-518

The M-Machine Multicomputer MASSACHUSETTS INST OF TECH CAMBRIDGE ARTIFICIAL INTELLIGENCE LAB Dally, W. J., Keckler, S. W., Fillo, M., Carter, N. P., Chang, A. 1995

1st IEEE Symposium on High-Performance Computer Architecture Nuth, P. R., Dally, W. J. 1995

Low-latency plesiochronous data retiming Dennison, L. R., Dally, W. J., Xanthopoulos, D. 1995

Implementation of atomic primitives on distributed shared memory multiprocessors Dally, W. J., Michael, M. M., Scott, M. L. 1995

The M-Machine operating system Massachusetts Institute of Technology Dally, W. J., Gurevich, Y. 1995

The subspace model: A theory of shapes for parallel systems Knobe, K., Dally, W. J. 1995

Fault tolerant adaptive routing in multicomputer networks Massachusetts Institute of Technology Xanthopoulos, T. 1995

The named-state register file: Implementation and performance Nuth, P. R., Dally, W. J. 1995

Proceedings Dally, W. J., Poulton, J. W., Ishii, A. T. 1995

1st IEEE Symposium on High-Performance Computer Architecture Fiske, S., Dally, W. J. 1995

Issues in the Design and Implementation of Instruction Processors for Multicomputers (Position Statement) Multithreaded Computer Architecture Dally, W. J. 1994: 79-82

The implementation of a reliable router chip Massachusetts Institute of Technology Dally, W. J., Kan, K. H. 1994

The design of a high performance SPARC bus interface Massachusetts Institute of Technology Dally, W. J., Wong, D. F. 1994

Efficient message subsystem design Massachusetts Institute of Technology Dally, W. J., Lee, W. S. 1994

VLSI design for freshmen and sophomores Massachusetts Institute of Technology Dally, W. J., Harris, D. 1994

Subspace optimizations Automatic Parallelization Knobe, K., Dally, W. J. 1994: 153-176

M-Machine Microarchitecture v1. 11 Dally, W. J., Keckler, S. W., Carter, N., Chang, A., Fillo, M., Lee, W. S. 1994

Logging and recovery in a highly concurrent database Dally, W. J., Keen, J. S. 1994

The reliable router: A reliable and high-performance communication substrate for parallel computers Parallel Computer Routing and Communication Dally, W. J., Dennison, L. R., Harris, D., Kan, K., Xanthopoulos, T. 1994: 241-255

Named state and efficient context switching Multithreaded Computer Architecture Nuth, P. R., Dally, W. J. 1994: 201-212

Multithreaded computer architecture Boston: Kluwer Academic Publishers Dennis, J. B., Gao, G. R., Iannucii, R. A., Dally, W. J. 1994

Architecture and implementation of the Reliable Router Dally, W. J., Dennison, L. R., Harris, D., Kan, K., Xanthopoulos, T. 1994

A subspace optimizing data parallel complier Massachusetts Institute of Technology Dally, W. J., Dampier, T. O. 1994

A numerical engine for distributed sparse matrices Massachusetts Institute of Technology Dally, W. J., Telichevesky, R. 1994

The design and implementation of an actor language based on linear logic Massachusetts Institute of Technology Dally, W. J., Tse, C. S. 1994

How to Choose the Grain Size of a Parallel Computer MIT/LCS Technical Report Yeung, D., Dally, W. J., Agarwal, A. 1994: MIT-LCS-TR-739

XEL: extended ephemeral logging for log storage management Keen, J. S., Dally, W. J. 1994

Hardware support for fast capability-based addressing ACM SIGPLAN Notices Carter, N. P., Keckler, S. W., Dally, W. J. 1994; 11 (29): 319-327

Deadlock-free adaptive routing in multicomputer networks using virtual channels Parallel and Distributed Systems, IEEE Transactions Dally, W. J., Aoki, H. 1993; 4 (4): 466-475

The J-machine multicomputer: an architectural evaluation ACM SIGARCH Computer Architecture News Noakes, M. D., Wallach, D. A., Dally, W. J. 1993; 2 (21): 224-235

Performance evaluation of ephemeral logging ACM SIGMOD Record Keen, J. S., Dally, W. J. 1993; 2 (22): 187-196

Evaluation of mechanisms for fine-grained parallel programs in the J-machine and the CM-5 ACM SIGARCH Computer Architecture News Spertus, E., Goldstein, S. C., Schauser, K. E., Eicken, T. V., Culler, D. E., Dally, W. J. 1993; 3 (21): 302-313

COSMOS: An operating system for a fine-grain concurrent computer Research directions in concurrent object-oriented programming Horwat, W., Totty, B., Dally, W. J. 1993: 452-476

The J-Machine architecture and evaluation Compcon Spring'93, Digest of Papers. Dally, W. J., Keen, J. S., Noakes, M. D. 1993: 183-188

Message-driven processor in a concurrent computer US Patent Dally, W. J., Chien, A. A., Horwat, W. P., Fiske, S. 1993; 212 (5): 778

A Video Controller and Distributed Frame Bu er for the J-Machine Dally, W. J., McDonald, E. 1993

A universal parallel computer architecture New Generation Computing Dally, W. J. 1993; 3-4 (11): 227-249

High-performance bidirectional signalling in VLSI systems Dennison, L. R., Lee, W. S., Dally, W. J. 1993

Mechanisms for parallel computers Parallel Computing on Distributed Memory Multiprocessors Dally, W. J., Wills, D. S., Lethin, R. 1993: 3-25

The Future of Computing is Parallel Computer Science Department Dally, W. J. 1993

The J-machine: a fine-grain parallel computer Computing Systems in Engineering Dally, W. J., Chien, A., Davison, R., Fiske, J. A., Furman, S., Fyler, G. 1992; 1 (3): 7-15

Design and implementation of the Message-Driven Processor Dally, W. J., Ahmed, S., Carrick, P., Chien, A., Davison, R., Fiske, J. 1992

The message-driven processor: A multicomputer processing node with efficient mechanisms Micro, IEEE Dally, W. J., Fiske, J. A., Keen, J. S., Lethin, R. A., Noakes, M. D., Nuth, P. R. 1992; 2 (12): 23-39

The message driven processor: An integrated multicomputer processing element Computer Design: VLSI in Computers and Processor Dally, W. J., Chien, A., Fiske, J. A., Fyler, G., Horwat, W., Keen, J. S. 1992

Processor coupling: Integrating compile time and runtime scheduling for parallelism ACM SIGARCH Computer Architecture News Keckler, S. W., Dally, W. J. 1992; 2 (20): 202-213

INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE 1992 Scientific information bulletin Keckler, S. W., Dally, W. J. 1992; 4 (17): 35

Custom integrated circuits Custom Integrated Circuits Dally, W. J., Allen, J., Wyatt Jr, J. L., White, J. K., Devadas, S., Armstrong, R. C. 1992

A fast translation method for paging on top of segmentation Computers, IEEE Transactions Dally, W. J. 1992; 2 (41): 247-250

MDP design tools and methods Computer Design: VLSI in Computers and Processors Lethin, R. A., Dally, W. J. 1992: ICCD'92

Virtual-Channel Flow Control (PDF) Dally, W. J. 1992

The J-machine network Computer Design: VLSI in Computers and Processors Nuth, P. R., Dally, W. J. 1992

Pi: a parallel architecture interface Frontiers of Massively Parallel Computation, 1992., Fourth Symposium on the… Wills, D. S., Dally, W. J. 1992

Virtual-channel flow control Parallel and Distributed Systems, IEEE Transactions Dally, W. J. 1992; 2 (3): 194-205

Experiences Implementing Dataflow on a General-Purpose Parallel Computer. ICPP Spertus, E., Dally, W. J. 1991; 2: 231-235

A mechanism for efficient context switching Computer Design: VLSI in Computers and Processors Nuth, P. R., Dally, W. J. 1991: ICCD'91

Express cubes: improving the performance of< e1> k</e1>-ary< e1> n</e1>-cube interconnection networks Computers, IEEE Transactions Dally, W. J. 1991; 9 (40): 1016-1023

Experiments with Dataflow on a General-Purpose Parallel Computer. MASSACHUSETTS INST OF TECH CAMBRIDGE ARTIFICIAL INTELLIGENCE LAB Spertus, E., Dally, W. J. 1991

Experiments with data flow on a general-purpose parallel computer. Memorandum report Massachusetts Inst. of Tech., Cambridge, MA (United States). Artificial Spertus, E., Dally, W. J. 1991

Experiments with Dataflow on a General-Purpose Parallel Computer MASSACHUSETTS INST OF TECH CAMBRIDGE ARTIFICIAL INTELLIGENCE LAB Dally, W. J., Spertus, E. 1991

System design of the J-Machine Noakes, M., Dally, W. J. 1990

Experience with concurrent aggregates (CA): Implementation and programming Chien, A. A., Dally, W. J. 1990

Advanced Research in VLSI: Proceedings of the Sixth MIT Conference;[papers Presented at the Sixth MIT Conference on Advanced Research in VLSI, Held in Cambridge, Mass., in 1990] Da, W. J. 1990

The Message-Driven Processor: A Multicomputer Processing Node with E cient Mechanisms Dally, W. J., Davison, R., Fiske, J. A., Fyler, G., Keen, J. S., Lethin, R. A. 1990

Performance analysis of< e1> k</e1>-ary< e1> n</e1>-cube interconnection networks Computers, IEEE Transactions Dally, W. J. 1990; 6 (39): 775-785

Network and processor architecture for message-driven computers VLSI and Parallel Computation Dally, W. 1990: 140-222

Critical Problems in Very Large Scale Computer Systems MASSACHUSETTS INST OF TECH CAMBRIDGE Agarwal, A., Dally, W. J., Devadas, S., Knight Jr, T. F., Leighton, F. T., Nabors, K. 1990

Concurrent aggregates (CA) ACM Sigplan Notices Chien, A. A., Dally, W. J. 1990; 3 (25): 187-196

Virtual-channel flow control Dally, W., J. 1990

Proceedings of the sixth MIT conference on Advanced research in VLSI Dally, W. J. 1990

Simultaneous bidirectional signalling for IC systems Computer Design: VLSI in Computers and Processors Lam, K., Dennison, L. R., Dally, W. J. 1990: ICCD'90

Critical Problems in Very Large Scale Computer Systems KURTZ LABS YELLOW SPRINGS OH Leighton, F. T., Knight, T. F., Agarwal, A., Dally, W. J., Devadas, S. 1990

A hardware logic simulation system Computer-Aided Design of Integrated Circuits and Systems, IEEE Transactions Agrawal, P., Dally, W. J. 1990

Express cubes: Improving the performance of k-ary n-cube interconnection networks MASSACHUSETTS INST OF TECH CAMBRIDGE LAB FOR COMPUTER SCIENCE Dally, W. J. 1989

Algorithms for accuracy enhancement in a hardware logic simulator Agrawal, P., Tutundjian, R., Dally, W. 1989

Universal mechanisms for concurrency PARLE'89 Parallel Architectures and Languages Europe Dally, W. J., Wills, D. S. 1989: 19-33

Experience with CST: Programming and Implementation MASSACHUSETTS INST OF TECH CAMBRIDGE MICROSYSTEMS RESEARCH CENTER Chien, A. A., Dally, W. J., Horwat, W. 1989

A fine-grain, message-passing processing node Concurrent Computations Dally, W. J. 1989: 375-389

The J-machine: a fine grain concurrent computer MASSACHUSETTS INST OF TECH CAMBRIDGE MICROSYSTEMS RESEARCH CENTER Dally, W. J., Chien, A., Fiske, S., Horwat, W., Keen, J. 1989

Micro-optimization of floating-point operations ACM SIGARCH Computer Architecture News Dally, W. J. 1989; 2 (17): 283-289

Experience with CST: Programming and implementation ACM SIGPLAN Notices Horwat, W., Chien, A. A., Dally, W. J. 1989; 7 (24): 101-109

A network element based fault tolerant processor Massachusetts Institute of Technology Abler, T. A. 1988

Finite-grain message passing concurrent computers Dally, W. 1988

The J-machine: System support for Actors MASSACHUSETTS INST OF TECH CAMBRIDGE MICROSYSTEMS RESEARCH CENTER Dally, W. J. 1988

Object-Oriented Concurrent Programming in CST MASSACHUSETTS INST OF TECH CAMBRIDGE MICROSYSTEMS RESEARCH CENTER Chien, A. A., Dally, W. J. 1988

Message-Driven Processor architecture, Version 11. Artificial intelligence memo Massachusetts Inst. of Tech., Cambridge (USA). Artificial Intelligence Lab. Dally, W., Chien, A., Fiske, S., Horwat, W., Keen, J. 1988

Message-Driven Processor Architecture MASSACHUSETTS INST OF TECH CAMBRIDGE MICROSYSTEMS RESEARCH CENTER Dally, W., Chien, A., Fiske, S., Horwat, W., Keen, J. 1988

Critical Problems in Very Large Scale Computer Systems MASSACHUSETTS INST OF TECH CAMBRIDGE MICROSYSTEMS RESEARCH CENTER Knight, T. F., Penfield, P., Glasser, L. A., Agarwal, A., Dally, W. J. 1988

Critical problems in very-large-scale computer systems. Semiannual technical report, 1 April-30 September 1988 Massachusetts Inst. of Tech., Cambridge (USA). Microsystems Research Center Penfield, P., Agarwal, A., Dally, W. J., Devadas, S., Knight, T. F. 1988

Object-oriented concurrent programming in CST Dally, W. J., Chien, A. A. 1988

The reconfigurable arithmetic processor ACM SIGARCH Computer Architecture News Fiske, S., Dally, W. J. 1988; 2 (16): 30-36

Mechanisms for Concurrent Computing FGCS Dally, W. J. 1988: 154-156

The Reconfigurable Arithmetic Processor MASSACHUSETTS INST OF TECH CAMBRIDGE MICROSYSTEMS RESEARCH CENTER Dally, W. J., Fiske, S. 1988

The Balanced Cube A VLSI Architecture for Concurrent Data Structures Dally, W. J. 1987: 27-73

Architecture and design of the MARS hardware accelerator Agrawal, P., Dally, W. J., Ezzat, A. K., Fischer, W. C., Jagadish, H. V., Krishnakumar, A. 1987

Performance analysis of k-ary n-cube interconnection networks NASA STI/Recon Technical Report N Dally, W. J. 1987; 88: 30010

MARS: A multiprocessor-based programmable accelerator Design & Test of Computers, IEEE Agrawal, P., Dally, W. J., Fischer, W. C., Jagadish, H. V., Krishnakumar, A. S., Tutundjian, R. 1987; 5 (4): 28-36

Graph Algorithms A VLSI Architecture for Concurrent Data Structures Dally, W. J. 1987: 75-132

Deadlock-free message routing in multiprocessor interconnection networks Computers, IEEE Transactions Dally, W. J., Seitz, C. L. 1987; 5 (100): 547-553

A coherent VLSI environment Massachusetts Inst. of Tech. Report Penfield Jr, P., Dally, W. J., Glasser, L. A., Knight Jr, T. F., Leighton, F. T. 1987

Concurrent Smalltalk A VLSI Architecture for Concurrent Data Structures Dally, W. J. 1987: 13-25

A message passing system for a fault tolerant parallel processor Massachusetts Institute of Technology Dally, W. J., Heyda, R. L. 1987

A Coherent VLSI Design Environment MASSACHUSETTS INST OF TECH CAMBRIDGE Abelson, H., Penfield, P., Antoniadis, D. A., Dally, W. J., Fonstad, C. G. 1987

Design of a self-timed VLSI multicomputer communication controller NASA STI/Recon Technical Report Dally, W. J., Song, P. 1987; 88: 30014

Coherent VLSI environment. Semiannual technical report, 1 October 1986-31 March 1987 Massachusetts Inst. of Tech., Cambridge (USA). Microsystems Research Center Penfield, P., Dally, W. J., Glasser, L. A., Knight, T. F., Leighton, F. T. 1987

A coherent VLSI design environment MASSACHUSETTS INST OF TECH CAMBRIDGE MICROSYSTEMS RESEARCH CENTER Penfield Jr, P., Dally, W. J., Glasser, L. A., Knight Jr, T. F., Leighton, F. T., Wyatt Jr, J. L. 1987

Architecture of a Message-Driven Processor MASSACHUSETTS INST OF TECH CAMBRIDGE MICROSYSTEMS RESEARCH CENTER Chao, L., Dally, W. J., Chien, A., Hassoun, S., Horwat, W. 1987

A Coherent VLSI Design Environment MASSACHUSETTS INST OF TECH CAMBRIDGE MICROSYSTEMS RESEARCH CENTER Leighton, F. T., Penfield, P., Glasser, L. A., Knight, T. F., Dally, W. J. 1987

Concurrent computer architecture Massachusetts Inst. of Tech., Cambridge (USA). Artificial Intelligence Lab. Dally, W. J. 1987

The torus routing chip Distributed computing Dally, W. J., Seitz, C. L. 1986; 4 (1): 187-196

On the Performance of k-ary n-cube Interconnection Networks California Institute of Technology Dally, W. J. 1986

5208: TR: _86 Dally, W. J. 1986

The torus routine chip Dally, W. J., Seitz, C. L. 1986

A High-performance VLSI Quaternary Serial Multiplier Dally, W. J. 1986

Wire-efficient VLSI multiprocessor communication networks Massachusetts Institute of Technology, Microsystems Program Office Dally, W. J. 1986

Directions in concurrent computing Dally, W. J. 1986

A Coherent VLSI Design Environment MASSACHUSETTS INST OF TECH CAMBRIDGE MICROSYSTEMS RESEARCH CENTER Leiserson, C. E., Penfield, P., Glasser, L. A., Knight, T. F., Dally, W. J. 1986

VLSI architecture for concurrent data structures California Inst. of Tech. Dally, W. J. 1986

Concurrent Algorithms for the Max-Flow Problem California Institute of Technology Dally, W. J. 1985

A hardware architecture for switch-level simulation Computer-Aided Design of Integrated Circuits and Systems, IEEE Transactions Dally, W. J., Bryant, R. E. 1985

The balanced cube: a concurrent data structure California Institute of Technology Dally, W. J., Seitz, C. L. 1985

Fungicides for Crop Protection: Invited papers International Specialized Book Service Incorporated Dally, W. J., Smith, I. M. 1985

An object oriented architecture ACM SIGARCH Computer Architecture News Dally, W. J., Kajiya, J. T. 1985; 3 (13): 154-161

The MOSSIM Simulation Engine Architecture and Design California Institute of Technology Dally, W. J. 1984

A Special Purpose Processor for Switch-Level Simulation International Conference on Computer Aided Design Dally, W. J., Bryant, R. E. 1984

William Dally

Willard R. and Inez Kerr Bell Professor in the School of Engineering and Professor (Research) of Electrical Engineering

Computer Science

Bio

Academic Appointments

Boards, Advisory Committees, Professional Organizations

Professional Education

Contact

Links

2015-16 Courses

2014-15 Courses

2013-14 Courses

2012-13 Courses

Stanford Advisees

All Publications