uccross.github.io

Logo of the Center for Research in Open Source Software

Open Source Project Ideas

Light bulb with digital business interface

We are happy to announce we have been selected as a GSoC mentor organization. Please refer to our proposal template and application process for details on how to apply for a GSoC contributor slot.

This list of projects are meant for student and other up-and-coming developers interested in contributing to participating Universiy of California-based open source projects. The list below notes the UC-based open source project and ideas associated to each that were submitted by a UC-affiliated mentor. If you have any questions, please visit our Gitter channel: Join the chat at
https://gitter.im/uccross/gsoc

Table of Contents:

LiveHD

Projects for LiveHD. Lead Mentors: Jose Renau and Sheng-Hong Wang

HIF Tooling

   
Title HIF tooling
Description Tools around Hardware Interchange Format (HIF) files
Mentor(s) Jose Renau
Skills C++17
Difficulty Medium
Size Medium 175 hours
Link  

HIF (https://github.com/masc-ucsc/hif) stands for Hardware Interchange Format. It is designed to be a efficient binary representation with simple API that allows to have generic graph and tree representations commonly used by hardware tools. It is not designer to be a universal format, but rather a storate and traversal format for hardware tools.

LiveHD has 2 HIF interfaces, the tree (LNAST) and the graph (Lgraph). Both can read/write HIF format. The idea of this project is to expand the hif repository to create some small but useful tools around hif. Some projects:

Mockturtle

   
Title Mockturtle
Description Perform synthesis for graph in LiveHD using Mockturtle
Mentor(s) Jose Renau
Skills C++17, synthesis
Difficulty Medium
Size Medium 175 hours
Link  

There are some issues with Mockturtle integration (new cells) and it is not using the latest Mockturtle library versions. The goal is to use Mockturtle (https://github.com/lsils/mockturtle) with LiveHD. The main characteristics:

Query Shell

   
Title Query Shell
Description Create a console app that interacts with LiveHD to query parameters about designs
Mentor(s) Jose Renau
Skills C++17
Difficulty Medium
Size Medium 175 hours
Link  

Wavedrom and duh allows to dump bitfield information for structures. It would be interesting to explore to dump tables and bit fields for Lgraph IOs, and structs/fields inside the module. It may be a way to integrate with the documentation generation.

Example of queries: show path, show driver/sink of, do topo traversal,….

As an interesting extension would be to have some simple embedded language (TCL or ChaiScript or ???) to control queries more easily and allow to build functions/libraries.

Lgraph and LNAST check pass

   
Title Lgraph and LNAST check pass
Description Create a pass that check the integrity/correctness of Lgraph and LNAST
Mentor(s) Jose Renau
Skills C++17
Difficulty Medium
Size Large 350 hours
Link  

Create a pass that checks that the Lgraph (and/or LNAST) is semantically correct. The LNAST already has quite a few tests (pass.semantic), but it can be further expanded. Some checks:

unbitwidth

   
Title unbitwidth
Description Not all the variables need bitwidth information. Find the small subset
Mentor(s) Jose Renau
Skills C++17
Difficulty Medium
Size Medium 175 hours
Link  

This pass is needed to create less verbose CHISEL and Pyrope code generation.

The LGraph can have bitwidth information for each dpin. This is needed for Verilog code generation, but not needed for Pyrope or CHISEL. CHISEL can perform local bitwidth inference and Pyrope can perform global bitwidth inference.

A new pass should remove redundant bitwidth information. The information is redundant because the pass/bitwidth can regenerate it if there is enough details. The goal is to create a pass/unbitwidth that removes either local or global bitwidth. The information left should be enough for the bitwidth pass to regenerate it.

Eusocial Storage Devices

As storage devices get faster, data management tasks rob the host of CPU cycles and main memory bandwidth. The Eusocial project aims to create a new interface to storage devices that can leverage existing and new CPU and main memory resources to take over data management tasks like availability, recovery, and migrations. The project refers to these storage devices as “eusocial” because we are inspired by eusocial insects like ants, termites, and bees, which as individuals are primitive but collectively accomplish amazing things.

Dynamic function injection for RocksDB

Recent research reveals that the compaction process in RocksDB can be altered to optimize future data access by changing the data layout in compaction levels. The benefit of this approach can be extended to different data layout optimization based on application access patterns and requirements. In this project, we want to create an interface that would allow users to dynamically inject layout optimization functions to RockDB, using containerization technologies such as Webassembly.

Demonstrating a composable storage system accelerated by memory semantic technologies

Since the last decade, the slowing down in the performance improvement of general-purpose processors is driving the system architecture to be increasingly heterogeneous. We have seen the kinds of domain-specific accelerator hardware (e.g., FPAG, SmartNIC, TPU, GPU) are growing to take over many different jobs from the general-purpose processors. On the other hand, the network and storage device performance have been tremendously improved with a trajectory much outweighed than that of processors. With this trend, a natural thought to continuously scale the storage system performance economically is to efficiently utilize and share different sources from different nodes over the network. There already exist different resource sharing protocols like CCIX, CXL, and GEN-Z. Among these GEN-Z is the most interesting because, unlike RDMA, it enables remote memory accessing without exposing details to applications (i.e., not application changes). Therefore, it would be interesting to see how/whether these technologies can help improve the performance of storage systems, and to what extent. This project would require building a demo system that uses some of these technologies (especially GEN-Z) and run selected applications/workloads to better understand the benefits.

When will Rotational Media Users abandon SATA and converge to NVMe?

Goal: Determine the benefits in particular market verticals such as genomics and health care to converge the storage stack in data center computer systems to the NVMe device interface, even when devices include rotational media (aka disk drives). The key question: “When do people abandon SATA and SAS and converge to NVMe?”

Background: NVMe is a widely used device interface for fast storage devices such as flash that behave much more like random access memory than the traditional rotational media. Rotational media is accessed mostly via SATA and SAS which has served the industry well for close to two decades. SATA in particular is much cheaper than NVMe. Now that NVMe is widely available and quickly advancing in functionality, an interesting question is whether there is a market for rotational media devices with NVMe interfaces, converging the storage stack to only one logical device interface, thereby enabling a common ecosystem and more efficient connectivity from multiple processes to storage devices.

The NVMe 2.0 specification, which came out last year, has been restructured to support the increasingly diverse NVMe device environment (including rotational media). The extensibility of 2.0 encourages enhancements of independent command sets such as Zoned Namespaces (ZNS) and Key Value (NVMe-KV) while supporting transport protocols for NVMe over Fabrics (NVMe-oF). A lot of creative energy is now focused on advancing NVMe while SATA has not changed in 16 years. Having all storage devices connect the same way not only frees up space on motherboards but also enables new ways to manage drives, for example via NVMe-oF that allows drives to be networked without additional abstraction layers.

Suggested Project Structure: This is really just a suggestion for a starting point. As research progresses, a better structure might emerge.

  1. Convergence of software stack: seamless integration between rotational media and hot storage
    • Direct tiering: one unified interface to place data among fast and slow devices on the same NVMe fabric depending on whether the data is hot or cold.
  2. Computational storage:
    • What are the architectures of computational NVMe devices? For example, offloading compute to an FPGA vs an onboard processor in a disk drive?
    • Do market verticals such as genomics and health care for one over the other? When do people abandon SATA and converge to NVMe?

Project tasks:

Interesting links:
https://www.opencompute.org/wiki/Storage/NVMeHDD
https://2021ocpglobal.fnvirtual.app/a/event/1714 (video and slides, requires $0 registration)
https://www.storagereview.com/news/nvme-hdd-edges-closer-to-reality
https://www.tomshardware.com/news/seagate-demonstrates-hdd-with-pcie-nvme-interface
https://nvmexpress.org/everything-you-need-to-know-about-the-nvme-2-0-specifications-and-new-technical-proposals/
https://www.tomshardware.com/news/nvme-2-0-supports-hard-disk-drives

SmartNICs

SmartNICs have become one of important components in heterogeneous system architectures for offloading computing to accelerate particular host functions that mostly found in networking, storage, security, and management services. Generally speaking, a SmartNIC is programmable, and therefore it is “smart.” Though the programmability can have different definitions or implementations by different vendors, the hardware blocks of a SmartNIC that enables programmability are highly specialized (e.g., compression engine and encryption engine). Therefore, SmartNICs can be more efficient than host CPUs on particular jobs. The fundamental problem for SmartNICs for applications is how we can gain performance benefits by leveraging this new hardware.

Employ kernel-bypass techniques in Apache Arrow Flight

Apache Arrow provides a set of tools to allow efficient in-memory analytics/processing columnar data. Arrow Flight is a data transport framework for streaming Arrow data across different services in a cluster. It is built based on gRPC, Google’s popular HTTP/2-based RPC library. The most common deployment of Arrow Flight is to run on top of the kernel TCP/IP stack. This kernel stack is known to be much slower than kernel-bypass stacks such as DPDK and RDMA. Given that there are already solutions for RPC over these kernel-bypass stacks (e.g., eRPC, and Seastar), the goal of this project is to allow Arrow Flight to leverage the higher performance of these stacks instead of the traditional gRPC.

Arrow data filtering using regular expression accelerator

The current implementation of the row-based data filtering in Apache Arrow uses host CPUs. Offloading this filtering function to SmartNICs can save the host CPU cycles and return to user applications while potentially gaining better performance. [BlueField-2 DPU] (https://www.nvidia.com/content/dam/en-zz/Solutions/Data-Center/documents/datasheet-nvidia-bluefield-2-dpu.pdf) provides a built-in RegEx accelerator for network Ethernet packets. This project is to develop the connection to allow Apache Arrow to leverage this accelerator, and eventually characterize the performance benefits in throughput, latency, and energy consumption.

Open Source Autonomous Vehicle Controller

The OSAVC is a vehicle-agnostic open source hardware and software project. This project is designed to provide a real-time hardware controller adaptable to any vehicle type, suitable for aerial, terrestrial, marine, or extraterrestrial vehicles. It allows control researchers to develop state estimation algorithms, sensor calibration algorithms, and vehicle control models in a modular fashion such that once the hardware set has been developed switching algorithms requires only modifying one C function and recompiling.

Lead mentor: Aaron Hunter

Projects for the OSAVC:

Vehicle/Craft sensor driver development

Help develop a sensor library for use in autonomnous vehicles. Possible sensors include range finders, ping sensors, IMUs, GPS receivers, RC receivers, barometers, air speed sensors, etc. Code will be written in C using state machine methodology and non-blocking algorithms. Test the drivers on a Microchip microncontroller.

Path finding algorithm using OpenCV and machine learning

Use OpenCV to identify a track for an autonomous vehicle to follow. Build on previous work by developing a new model using EfficientDet and an existing training set of images. Port the model to TFlite and implement on the Coral USB Accelerator. Evaluate its performance against our previous efforts.

State estimation/sensor fusion algorithm development

Implement an optimal state estimation algorithm from a model. This model can be derived from a Kalman filter or some other state estimation filter (e.g., Mahoney filter). THe model takes sensor readings as input and provides an estimate of the state of a vehicle. Finally, convert the model to standard C using the Simulink code generation or implement in Python (for use on a single board computer, e.g., Raspberry Pi)

SkyhookDM

SkyhookDM

The Skyhook Data Management project extends object storage with data management functionality for tabular data. SkyhookDM enables storing and query tabular data in the Ceph distributed object storage system. It thereby turns Ceph into an Apache Arrow-native storage system, utilizing the Arrow Dataset API to store and query data with server-side data processing, including selection and projection that can significantly reduce the data returned to the client.

SkyhookDM is now part of Apache Arrow (see blog post).


Support reading from Skyhook in Dask/Ray using the Arrow Dataset API

Problem: Dask and Ray are parallel-computing frameworks similar to Apache Spark but in a Python ecosystem. Each of these frameworks support reading tabular data from different data sources such as a local filesystem, cloud object stores, etc. These systems have recently added support for the Arrow Dataset API to read data from different sources. Since, the Arrow dataset API supports Skyhook, we can leverage this capability to offload compute-heavy Parquet file decoding and decompression into the Ceph storage layer. This can help us speed up the queries significantly as CPU will get freed up in the Dask/Ray workers for other processing tasks.

Implement Gandiva based query executor in SkyhookDM

Problem: Gandiva allows efficient evaluation of query expressions using runtime code generation using LLVM. The generated code leverages SIMD instructions and is highly optimized for parallel processing in modern CPUs. It is natively supported by Arrow for compiling and executing expressions. SkyhookDM currently uses the Arrow Dataset API (which internally uses Arrow Compute APIs) to execute query expressions inside the Ceph OSDs. Since, the Arrow Dataset API particularly does not support Gandiva currently, the goal of this project is to add support for Gandiva in the Arrow Dataset API in order to accelerate query processing when offloaded to the storage layer. This will help Skyhook combat some of the peformance issues due to the inefficient serialization interface of Arrow.

References:

Add Ability to create and save views from Datasets

Problem - Workloads may repeat the same or similar queries over time. This causes repetition of IO and compute operations, wasting resources. Saving previous computation in the form of materialized views can provide benefit for future workload processing. Solution - Add a method to the Dataset API to create views from queries and save the view as an object in a separate pool with some object key that can be generated from the query that created it.

Reference: https://docs.dremio.com/working-with-datasets/virtual-datasets.html


Integrating Delta Lake on top of SkyhookDM

Delta Lake is a new architecture for querying big data lakes through Spark, providing transactions. An important benefit of this integration will be to provide an SQL interface for SkyhookDM functionality, through Spark SQL. This project will further build upon our current work connecting Spark to SkyhookDM through the Arrow Dataset API. This would allow us to run some of the TPC-DS queries (popular set of SQL queries for benchmarking databases) on SkyhookDM easily.

Reference: [Delta Lake paper] (https://databricks.com/jp/wp-content/uploads/2020/08/p975-armbrust.pdf)


Proactive Data Containers (PDC)

Proactive Data Containers (PDC) are containers within a locus of storage (memory, NVRAM, disk, etc.) that store science data in an object-oriented manner. Managing data as objects enables powerful optimization opportunities for data movement and transformations, and storage mechanisms that take advantage of the deep storage hierarchy and enable automated performance tuning

Python interface to an object-centric data management system

Proactive Data Containers (PDC) is an object-centric data management system for scientific data on high performance computing systems. It manages objects and their associated metadata within a locus of storage (memory, NVRAM, disk, etc.). Managing data as objects enables powerful optimization opportunities for data movement and transformations, and storage mechanisms that take advantage of the deep storage hierarchy and enable automated performance tuning. Currently PDC has a C interface. Providing a python interface would make it easier for more Python applications to utilize it.


HDF5

HDF5 is a unique technology suite that makes possible the management of extremely large and complex data collections.

The HDF5 technology suite includes:

Python Interface to HDF5 Asynchronous I/O

HDF5 is a well-known library for storing and accessing (known as “Input and Output” or I/O) data on high-performance computing systems. Recently, new technologies, such as asynchronous I/O and caching, have been developed to utilize fast memory and storage devices and to hide the I/O latency. Applications can take advantage of an asynchronous interface by scheduling I/O as early as possible and overlapping computation with I/O operations to improve overall performance. The existing HDF5 asynchronous I/O feature supports the C/C++ interface. This project involves the development and performance evaluation of a Python interface that would allow more Python-based scientific codes to use and benefit from the asynchronous I/O.

CephFS

CephFS is a distributed file system on top of Ceph. It is implemented as a distributed metadata service (MDS) that uses dynamic subtree balancing to trade parallelism for locality during a continually changing workloads. Clients that mount a CephFS file system connect to the MDS and acquire capabilities as they traverse the file namespace. Capabilities not only convey metadata but can also implement strong consistency semantics by granting and revoking the ability of clients to cache data locally.

CephFS namespace traversal offloading

The frequency of metadata service (MDS) requests relative to the amount of data accessed can severely affect the performance of distributed file systems like CephFS, especially for workloads that randomly access a large number of small files as is commonly the case for machine learning workloads: they purposefully randomize access for training and evaluation to prevent overfitting. The datasets of these workloads are read-only and therefore do not require strong coherence mechanisms that metadata services provide by default.

The key idea of this project is to reduce the frequency of MDS requests by offloading namespace traversal, i.e. the need to open a directory, list its entries, open each subdirectory, etc. Each of these operations usually require a separate MDS request. Offloading namespace traversal refers to a client’s ability to request the metadata (and associated read-only capabilities) of an entire subtree with one request, thereby offloading the traversal work for tree discovery to the MDS.

Once the basic functionality is implemented, this project can be expanded to address optimization opportunities, e.g. describing regular tree structures as a closed form expression in the tree’s root, shortcutting tree discovery.

OpenROAD - A Complete, Autonomous RTL-GDSII Flow for VLSI Designs

OpenROAD is a front-runner in open-source semiconductor design automation tools and know-how. OpenROAD reduces barriers of access and tool costs to democratize system and product innovation in silicon. The OpenROAD tool and flow provide an autonomous, no-human-in-the-loop, 24-hour RTL-GDSII capability to support low-overhead design exploration and implementation through tapeout. We welcome a diverse community of designers, researchers, enthusiasts and entrepreneurs who use and contribute to OpenROAD to make a far-reaching impact. Our mission is to democratize and advance design automation of semiconductor devices through leadership, innovation, and collaboration.

OpenROAD is the key enabler of successful Chip initiatives like the Google-sponsored Efabless that has made possible more than 150 successful tapeouts by a diverse and global user community. The OpenROAD project repository is https://github.com/The-OpenROAD-Project/OpenROAD.

Design of static RAMs in VLSI designs for good performance and area is generally time-consuming. Memory compilers significantly reduce design time for complex analog and mixed-signal designs by allowing designers to explore, verify and configure multiple variants and hence select a design that is optimal for area and performance. This project requires the support of memory compilers to OpenROAD-flow-scripts based on popular PDKS such as those provided by OpenRAM.

OpenLane Memory Design Macro Floorplanning

Improve and verify OpenLane design planning with OpenRAM memories. Specifically, this project will utilize the macro placer/floorplanner and resolve any issues for memory placement. Issues that will need to be addressed may include power supply connectivity, ability to rotate memory macros, and solving pin-access issues.

OpenLane Memory Design Timing Analysis

Improve and verify OpenLane Static Timing Analysis using OpenRAM generated library files. Specifically, this will include verifying setup/hold conditions as well as creating additional checks such as minimum period, minimum pulse width, etc. Also, the project will add timing information to Verilog behavioral model.

OpenLane Memory Macro PDK Support

Integrate and verify FreePDK45 OpenRAM memories with an OpenLane FreePDK45 design flow. OpenLane currently supports only Skywater 130nm PDK, but OpenROAD supports FreePDK45 (which is the same as Nangate45). This project will create a design using OpenRAM memories with the OpenLane flow using FreePDK45.

VLSI Power Planning and Analysis

Take the existing power planning (pdngen.tcl) module of openroad and recode the functionality in C++ ensuring that all of the unit tests on the existing code pass correctly. Work with a senior member of the team at ARM. Ensure that designs created are of good quality for power routing and overall power consumption.

Demos and Tutorials

For OpenLane, develop demos showing: The OpenLane flow and highight key features GUI visualizations Design Explorations and Experiments Different design styles and particular challenges

Comprehensive Flow Testing

Develop detailed test plans to test the OpenLane flow to expand coverage and advanced features. Add open source designs to the regression test suite to improve tool quality and robustness. This includes design specification, configuration and creation of all necessary files for regression testing. Suggested sources : ICCAS benchmarks, opencores, LSOracle for synthesis flow option.

Enhance GUI features

For OpenROAD, develop and enhance visualizations for EDA data and algorithms in the OpenROAD GUI. Allow deeper understanding of the tool results for users and tool internals for developers.

Automate OpenDB code Generation

For OpenROAD- Automatic code generation for the OpenDB database which allows improvements to the data model with much less hand coding. Allow the generation of storage, serialization, and callback code from a custom schema description format. r

Implement an NLP based AI bot aimed at increasing users, enhancing usability and building a knowledge base

The OpenROAD project contains a storehouse of knowledge in it’s Github repositories within Issues and Pull requests. Additionally, project related slack channels also hold useful information in the form of questions and answers, problems and solutions in conversation threads. Implement an AI analytics bot that filters, selects relevant discussions and classifies/records them into useful documentation and actionable issues. This should also directly track, increase project usage and report outcome metrics.

Package Management & Reproducibility

Projects related to reproducibility and package management, especially as it relates to store type package managers (NixOS, Guix or Spack).

Lead Mentor: Farid Zakaria fmzakari@ucsc.edu

Investigate the dynamic linking landscape

Dynamic linking as specified in the ELF file format has gone unchallenged since it’s invention. With many new package management models that eschew the filesystem hierarchy standard (i.e. Nix, Guix and Spack), many of the idiosyncrasies that define the way in which libraries are discovered are no longer useful and potentially harmful.

Specific tasks:

Apache AsterixDB

AsterixDB is an open source parallel big-data management system [http://asterixdb.apache.org/]. AsterixDB is a well-established Apache project that has been active in research for more than 10 years. It provides a flexible data model that supports modern NoSQL applications with a powerful query processor that can scale to billions of records and terabytes of data. Users can interact with AsterixDB through a power and easy to use declarative query language, SQL++, which provides a rich set of data types including timestamps, time intervals, text, and geospatial, in addition to traditional numerical and Boolean data types.

Geospatial Data Science on AsterixDB

Build a data science project using AsterixDB that analyzes geospatial data among other dimensions. Use Chicago Crimes as the main dataset and combine with other datasets including points of interests ZIP Code boundaries. During this project, we will answer interesting questions about the data and visualize the results such as:

The goals of this project are:

Machine Learning Integration

As a bonus task, and depending on the progress of the project, we can explore the integration of machine learning with AsterixDB through Python UDFs. We will utilize the AsterixDB Python integration through user-defined functions to connect AsterixDB backend with scikit-learn to build some unsupervised and supervised models for the data. For example, we can cluster the crimes based on their location and other attributes to find interesting patterns or hotspots.

FasTensor

FasTensor is a parallel execution engine for user-defined functions on multidimensional arrays. The user-defined functions follow the stencil metaphor used for scientific computing and is effective for expressing a wide range of computations for data analyses, including common aggregation operations from database management systems and advanced machine learning pipelines. FasTensor execution engine exploits the structural-locality in the multidimensional arrays to automate data management operations such as file I/O, data partitioning, communication, parallel execution, and so on.

Continuous Integration

Polyphorm / PolyPhy

Polyphorm is an agent-based system for reconstructing and visualizing optimal transport networks defined over sparse data. Rooted in astronomy and inspired by nature, we have used Polyphorm to reconstruct the Cosmic web structure, but also to discover network-like patterns in natural language data. You can find more details about our research here. Under the hood, Polyphorm uses a richer 3D scalar field representation of the reconstructed network, instead of a discrete representation like a graph or a mesh.

PolyPhy will be a Python-based redesigned version of Polyphorm, currently in the beginning of its development cycle. PolyPhy will be a multi-platform toolkit meant for a wide audience across different disciplines: astronomers, neuroscientists, data scientists and even artists and designers. All of the offered projects focus on PolyPhy, with a variety of topics including design, coding, and even research. Ultimately, PolyPhy will become a tool for discovering connections between different disciplines by creating quantitatively comparable structural analytics.

Develop website for PolyPhy

Develop a clean and welcoming website for the project. The organization needs to reflect the needs of PolyPhy users, but also provide a convenient entry point for interested project contributors. No excessive pop-ups or webjunk.

Specific tasks:

Design visual experience for PolyPhy’s website and presentations

Develop visual content for the project using its main themes: nature-inspired computation, biomimetics, interconnected structures. Aid in designing visual structure of the website as well as other public-facing artifacts.

Specific tasks:

Write PolyPhy’s technical story and content

Integral to PolyPhy’s presentation is a story that the users and the project contributors can relate to. The objective is to develop the verbal part of that story, as well as major portions of technical documentation that matches it. The difficulty of the project is scalable.

Specific tasks:

Video tutorials and presentation for PolyPhy

Create a public face for PolyPhy that reflects its history, context, and teaches its functionality to users in different degrees of familiarity.

Specific tasks:

Implement heterogeneous data I/O ops

By default, PolyPhy operates with an unordered set of points as an input and scalar fields (float ndarrays) as an output, but others are applicable as well. Design and implement interfaces to load and export different data formats (CSV, OBJ, HDF5, FITS…) and modalities (points, meshes, density fields). The difficulty of the project can be scaled based on contributor’s interest.

Specific tasks:

Setup CI/CD for PolyPhy

The objective is to setup a CI/CD pipeline that automates the build testing and deployment of the software. The resulting process needs to be robust to contributor errors and work in the distributed conditions of a diverse contributor base.

Specific tasks:

Refine PolyPhy’s UI and develop new functional elements

The key feature of PolyPhy is its interactivity. By interacting with the underlying simulation model, the user can adjust its parameters in real time and respond to its behavior. For instance, an astrophysics expert can load a dataset of 100k galaxies and reconstruct the large-scale structure of the intergalactic medium. A responsive UI combined with real-time visualization allows them to judge the fidelity of the reconstruction and make necessary changes.

Specific tasks:

Create new data visualization regimes

Data visualization is one of the core components of PolyPhy, as it provides a real-time overview of the underlying MCPM simulation. Through the feedback provided by the visualization, PolyPhy users can adjust the simulation model and make new findings about the dataset. Various operations over the reconstructed data (e.g. spatial searching) as well as important statistical summaries also benefit from clear visual presentation.

Specific tasks:

Discrete graph extraction from simulated scalar fields

Develop a custom method for graph extraction from scalar field data produced by PolyPhy. Because PolyPhy typically produces network-like structures, representing these structures as weighted discrete graphs is very useful for efficiently navigating the data. The most important property of this abstracted representation is that it preserves the topology of the base scalar field by navigating the 1D ridges of the scalar field.

Specific tasks:

DirtViz

DirtViz is a project to visualize data collected from sensors deployed in sensor networks. We have deployed a number of sensors measuring qualities like soil moisture, temperature, current and voltage in outdoor settings. This project involves extending (or replacing) our existing plotting scripts to create a fully-feledged dataviz tool tailored to the types of data collected from embedded systems sensor networks.

Visualize Sensor Data

OpenRAM

OpenRAM is an award winning open-source Python framework to create the layout, netlists, timing and power models, placement and routing models, and other views necessary to use SRAMs in ASIC design. OpenRAM supports integration in both commercial and open-source flows with both predictive and fabricable technologies. Most recently, it has created memories that are included on all of the eFabless/Google/Skywater MPW tape-outs.

Replace logging framework with library

Replace the custom logging framework in OpenRAM with Python logging module. New logging should allow levels of detail as well as tags to enable/disable logging of particular features to aid debugging.

ROM generator

Use the OpenRAM API to generate a Read-Only Memory (ROM) file from an input hex file. Project will automatically generate a Spice netlist, layout, Verilog model and timing characterization.

Register File generator

Use the OpenRAM API to generate a Register File from standard library cells. Project will automatically generate a Spice netlist, layout, Verilog model and timing characterization.

Built-In Self Test and Repair

Finish integration of parameterized Verilog modeule to support Built-In-Self-Test and Repair of OpenRAM memories using spare rows and columns in OpenRAM memories.

Layout verses Schematic (LVS) visualization

Create a visualization interface to debug layout verses schematic mismatches in Magic layout editor. Results will be parsed from a JSON output of Netgen.

Adaptive Load Balancers for Low-latency Multi-hop Networks

This project aims at designing efficient, adaptive link level load balancers for networks that handle different kinds of traffic, in particular networks where flows are heterogeneous in terms of their round trip times. Geo distributed data centers are one such example. With the large-scale deployments of 5G in the near future, there will be even more applications, including more bulk transfers of videos and photos, augmented reality applications and virtual reality applications which take advantage of 5G’s low latency service. With the development and discussion about Web3.0 and Metaverse, the network workloads across data centers are only going to get more varied and challenging. All these add to heavy, bulk of data being sent to the data centers and over the backbone network. These traffic have varying quality of service requirements, like low latency, high throughput and high definition video streaming. Wide area network (WAN) flows are typically data heavy tasks that consist of backup data taken for a particular data center. The interaction of the data center and WAN traffic creates a very interesting scenario with its own challenges to be addressed. WAN and data center traffic are characterized by differences in the link utilizations and round trip times. Based on readings and literature review, there seems to be very little work on load balancers that address the interaction of data center and WAN traffic. This in turn motivates the need for designing load balancers that take into account both WAN and data center traffic in order to create high performance for more realistic scenarios. This work proposes a load balancer that is adaptive to the kind of traffic it encounters by learning from the network conditions and then predicting the optimal route for a given flow.

Through this research we seek to contribute the following :

Adaptive, Dynamic Load Balancing for data center and WAN traffic

Specific tasks: