International Technology Roadmap for Semiconductors (ITRS) project the latest trend moving towards a system-level and platform-based design, involving large percentage of design reuse. Different Intellectual Property (IP) cores, including processor and memory, are interconnected to build a typical System-on-Chip (SoC) architectures. Larger SoC designs dictate the data communication to happen over the global interconnects. At 45 nm and below, interconnects have profound impact on the performance (and power), due to increased delays and cross-coupling from multiple sources. Hence, attaining timing closure with reasonable performance and low power is increasingly becoming impractical. Also, the traditional bus based interconnection architectures present synchronization nightmare in a heterogenous System-on-Chip environment. At system level, the performance of the shared-bus start to deteriorate with increased number of cores. Networks-on-chip (NoC) has been proposed as a new design paradigm to solve the communication and performance bottlenecks in the modern System-on-Chip designs. Unlike the shared-bus approach, the central idea in an NoC is to implement interconnection of various IP cores using on-chip packet-switched networks. Due to reduced development costs and shorter design cycles and Time-To-Market, reconfigurable devices, especially, the FPGAs are increasingly being used in low/medium volume applications in place of their ASIC counterparts. Due to the scalability issues present in the use of shared-bus, NoC is gaining attention in the latest FPGA-based SoCs. In spite of the advantages, being a typical shared network, an NoC suffers from bottlenecks involving hop latency, congestion, bandwidth violations and increased area. In this thesis, we innovate and implement novelty to realize efficient Networks-on-Chip using commercial Xilinx FPGAs. We present tangible solutions for the issues that plague the efficient Networks-on-Chip implementation on the reconfigurable fabric. First, we concentrate this area overhead issue, the solutions to which actually resulted in many-fold advantages. Area is at a premium on an FPGA and therefore, the communication network should be as small as possible. The on-chip micro network area can be reduced by: (1) Using a simple router without sacrificing on the performance, and (2) Reducing the number of routers. Implementing the first idea, we develop a light weight parallel router (LiPaR), with multiple optimizations that resulted in a significant reduction logic area usage. The highlight of this dissertation remains in the translation of the second idea with the proposition of a novel router design that can handle multiple logic cores simultaneously, without any performance penalty. The new Multi Local Port Router (MLPR) provided many-fold advantages including reduction in area, power, transit time & congestion, and most importantly, bandwidth optimization, resulting in an efficient and high performance NoC design. Essentially, the MLPR is a marriage between switch-based and router-based interconnection network. A NoC system comprising MLPRs represents a complex design environment and hence, generation of an efficient Network-on-Chip configuration is a great challenge. We present an exhaustive-search based optiMap algorithm (finding optimal solutions) and a heuristic based fast mapping cMap algorithm. The results portray a dramatic reduction in the latency as well as the number of packets flowing in the NoC mesh. All the ports of an MLPR have the same mesh co-ordinate, thus, providing an opportunity to multicast to all the cores attached to the same MLPR, exploiting which we present a modified router architecture called Multi2 Router. Utilizing the multicast capability, we present an energy-efficient NoC configuration generation approach (uMap), targeting data packet traffic reduction in the network. Optimization for performance, latency or area constraints favor addition of more ports onto a single router. But, after extensive experimentation, we find a point of diminishing returns with regard to the power efficiency in using larger MLPRs. In addition to the average power increase, we observe the occurrences of several IR drop violations with increased port count, thus presenting a tradeoff between performance and power efficiency. Task graphs in modern SoCs are not static in nature and the variations in the intercommunication patterns and bandwidth requirements need to taken into consideration during NoC architecture generation. Hence, we present technique to estimate the Minimum BandWidth Guarantee (MBWG) required for a given topology and extend to find the NoC architecture minimizing the MBWG, thereby, helping prevent surprises in terms of bandwidth violations. In summary, the dissertation presents novel and efficient router designs, supported by the NoC architecture generation algorithms. Despite having an FPGA bias, the ideas proposed in this research are equally applicable to ASICs, thus, improving and taking forward an efficient NoC design flow.